Overview
Overview
tandard Template Library (STL) is a powerful template library for C++, which is used extensively in the industry. It provides generic, fundamental data structures and algorithms useful for most of the programs. So, it avoids reinventing the wheel and provides code that is well tested, versatile, efficient and generic. For example, if you want to write a symbol table for your toy compiler, you can go ahead and write your own symbol table. However, this approach suffers from several disadvantages: The effort required to write a full-fledged, effective and efficient symbol table is substantial. The hand-written code needs to undergo rigorous testing (since the symbol table is a very important piece of a compiler). Finding and fixing problems can take significant amounts of time and energy. The data structure needs to be efficient and effective: achieving this is not easy. If some other programmers are assigned to maintain or extend your symbol table in the future, they will find it very difficult to
S
understand both the interface and implementation of your symbol table: only you know your program well, as it is nonstandard. Instead, when you use STL for implementing your symbol table, you can make use of a map (or a multi-map) container; use various generic algorithms (binary_search, sort etc) to operate on it; and make use of iterators to traverse the table as required. To be specific: You need not write a full-fledged implementation of the symbol table: you can concentrate on solving the actual, specific problem of providing a symbol table. You can reasonably expect the STL library you have to be rigorously tested. Though STL is a generic library, it is designed with efficiency in mind. It is in fact a very efficient library. STL is an industry standard and widely used library—so you can have any competent C++ programmer get some exposure to the STL library. A programmer who may possibly maintain or extend your symbol table will find it easy to understand the implementation of your symbol table.
A Ready Reckoner for the
Standard Template Library Learn about the many benefits of the Standard Template Library in this introductory article. 78
MAY 2007
|
LINUX FOR YOU
|
www.linuxforu.com
CMYK
Overview
The list is not exhaustive, yet it covers a few very important reasons to (re)use standard components as much as possible. However, just like C++, STL is designed for experienced programmers: it requires some expertise to make use of its full potential and there are lots of traps and pitfalls that a novice can easily fall into. Also, it is a generic library whose design is inspired by the functional programming paradigm: it is not object-oriented! Hence, you need some understanding and experience about the design philosophy and problemsolving approach of STL to make the best use of it.
STL components STL consists of three main parts: Containers (like a stack) Generic algorithms (like a sort) Iterators (similar to the use of pointers) These components are designed such that they are ignorant of specific details of other components, so that they can be combined together as the need arises. Here lies the secret of the power of STL: the ability to seamlessly combine the three to fit our need. Containers are objects that can hold other objects. We can use any type of object with these containers, but generally, it is assumed that an object that is used with a container has the following defined for the object: copy constructor assignment operator == operator < operator This is because whenever an object is used with a container, a copy of the object is created and only the copy is present in the container; so we need the copy constructor and assignment operator to be defined. Also, many of the algorithms used by the containers need a comparison between objects; so we need == and < overloaded operators.
Containers Containers are the data-structures in which we can store objects. Basically, we have two kinds of containers: Sequence containers: These containers store and retrieve data in a sequential fashion. They include the simple array, list, vector and deque. Associative containers: The elements of these containers are associated in some manner. The associative containers are map, multimap, set and multiset. They rely heavily on comparison operators, because the objects are stored in a sorted order. In other words, sequence containers can hold elements of the same type, whereas associative containers are capable of holding a key-value pair. Let’s look at a few of the containers in STL: vector: Vector can be treated as an array with the capability of growing or shrinking dynamically. This can be safely used instead of arrays. The elements can be accessed in two ways: using the overloaded operator [] using the method at The first one is easier and faster to use, but it is not range
checked. On the other hand, the at method does range checking and throws out_of_range exception if needed. deque: This is basically a double-ended queue. If we want to grow/shrink in a vector, we can do it only at one end. But with deque, we can do it at both the ends. It provides the same accessing efficiency as the vector but the allocation efficiency comparable with a list. list: Arrays are optimised for random access but are inefficient when it comes to inserting and deleting elements in the middle; so are the vector and deque. For operations requiring intensive insertions and deletions, use a list, which is very efficient for these operations (and is internally implemented as a double-linked list). With a list, you cannot have random access to the data and so the [] operator is not overloaded. map and set: Both store the elements along with a unique key. In most of the cases, these two are interchangeable. The only difference is that in a set, the values are irrelevant and we keep track of the keys only. They provide an important functionality, which the sequence containers do not provide—the find operation. multimap and multiset: These are extended versions of a map and set respectively. In a map and set, the keys should be unique. But a multimap and multiset do not have this constraint. All the operations of their respective counterparts are supported here also.
Algorithms The
header file provides us with many of the algorithms that we use in our day-to-day programming (and lots of not so obvious ones, too). For example, most of us have ended up writing our own versions of a sort, search, find, etc, and STL allows us to reuse the code and helps us to concentrate on more creative aspects of programming. Let us look at an example of using a fundamental operation—swap: #include #include using namespace std; int main(){ string this = “this”, that = “that”; std::swap(this, that); cout<< “this = “<< this<< “and that = “<< that<< endl; } // prints: this = that and that = this
Algorithms in STL are basically of three types: non-modifying (sequence); for example, find modifying (sequence); for example, fill sorted (sequence); for example, sort Non-modifying algorithms are for read-only/traversing functionality that essentially doesn’t modify the content of the containers. In short, they don’t modify the sequence on which they operate. Note that ‘sequence’ here refers to the ‘sequence containers’—referring to containers with
www.linuxforu.com
CMYK
|
LINUX FOR YOU
|
MAY 2007
79
Overview
elements of the same type T (homogeneous elements), for example, std::vector, std::list. Modifying algorithms may alter the sequence on which they operate. The last kind of algorithms work on sorted sequences—for example, the binary_search algorithm. As you can easily guess, swap comes under modifying sequence algorithms.
if(pos == (arr+4)) std::cout<< “Searched element not found in the array”; else std::cout<< “Found in the position “<< (pos - arr); //
Iterators are generalised pointers and act as the glue between containers and algorithms. STL algorithms are written in terms of iterator parameters, and STL containers provide iterators that can be plugged into algorithms. Generally, iterators point to a location within a container. Iterators have a pointer-like syntax (in many cases, iterators are indeed implemented as pointers internally). Thus, generic algorithms can handle arrays and pointers this is of significant importance since we need not throw away our C-style arrays for the sake of using this library. For example, in the find generic algorithm, we can either use arrays or container classes: #include #include using namespace std; int main(){ int arr[] = {1, 4, 9, 16, 25}; // some values int * arr_pos = find(arr, arr+4, 9); std::cout<< “array pos = “<< arr_pos - arr << endl; // find the first occurrence of 9 in the array vector int_vec; for(int i = 1; i <= 5; i++) int_vec.push_back(i*i); vector::iterator vec_pos = find (int_vec.begin(), int_vec.end(), 9); std::cout<< “vector pos = “<< (vec_pos - int_vec.begin()); // find the first occurrence of 9 in the vector } // prints: //
array pos = 2
//
vector pos = 2
Here, note that we are using iterators as ‘pairs’. This is how we generally make use of iterators: a way of marking the beginning and the end of the sequence to be operated on. However, unlike pointers, there is no iterator equivalent of ‘null’—it’s simply undefined behaviour to dereference an iterator pointing to some illegal value. Traditionally, returning null (0) is how we indicate that a value searched is found or not. Since null cannot be used for iterators, how do we indicate that the value searched is not found? For that, the element ‘one-past’ the end is used (note that it is not illegal for a pointer to point ‘one-past’ the last element in an array). For example:
MAY 2007
int * pos = find(arr, arr+4, 36);
// prints:
Iterators
80
int arr[] = {1, 4, 9, 16, 25}; // some values
|
LINUX FOR YOU
|
Searched element not found in the array
This aspect of iterators is very important to understand as it is used extensively in STL.
Why iterators? Let’s suppose that you want to go to the beginning of a list. You can use the member function front, which returns an iterator (reference) to the first element of the array, and in this way proceed with your usual work. You might wonder why we need iterators when member functions will do. Let us take the generic algorithm sort, which is used for sorting, say, a vector. If we hadn't had iterators, then we would have had to write a separate algorithm for each and every container. So we pass to this algorithm two iterators as parameters, which point to the start and end of the sequence to be sorted, respectively. You will notice that, with this mechanism we are able to use any sort of containers with an algorithm.
Important concepts for using STL Since STL provides a whole new way of solving the problems in C++, there are many concepts that need to be understood to learn and to make best use of STL. Function objects: Using C style function pointers is not type-safe and isn’t object oriented. An alternative in C++ is to use ‘function objects’. By overloading the function call operator (), we encapsulate a function and pass it on to some other functions. The advantages of using a function pointer (also referred to as ‘functor’) are: typesafe and object oriented efficient, as it can be inlined reusable, as it can be generic The idea is to overload the () operator so that the object can be used as if it were a function. Overloaded () can have any number of arguments / any return type. For example: #include using namespace std; class printClass{ public: template void operator() (T t) { cout<< t << endl; } }; template void print(T type, printClass &p){ p(type);
www.linuxforu.com
CMYK
// invoke the () operator
Overview
} int main(){ int i = 10; float f = 10.0; printClass p; print(i, p); print(f, p); } // prints //
10
//
10
This simple program makes use of function objects to print objects of any type (provided that they override ostream << ()). Plug-compatibility: Plug-compatibility in programming jargon means that a generic algorithm and a container can be plugged (used) together. For example, vector and sort are plug-compatible (you can apply sort function on a vector), whereas you cannot use sort for a list—for that you have to use the sort member function (of list). Theoretically, it is possible for generic sort to be plugged with list—but it will affect the efficiency of the code. So, list provides a separate member function to achieve that.
Adaptors: Adaptors, as the name itself hints, is a component that adapts (modifies) an existing interface of a component to expose a different one suitable for some other purpose. There are three types of adaptors: Sequence adaptor: The interface of a container is exposed in a different way. A classic example for sequence adaptors is from STL itself: stack is built on modifying the interface of deque. Iterator adaptor: When the interface of an iterator is exposed in a different way, it is an iterator adaptor. Function adaptor: Function adaptors are function objects—depending on the function object (say a ‘negator’ or a ‘predicate object’) passed, the behaviour of the algorithm may change. Pseudo-destructor: STL is fully generic and it needs many extensions/modifications, mostly related to templates to support it. For example, ‘pseudo destructor’ is just a syntactic convenience: it enables primitive type to be used with containers and algorithms. template void callDest(T &t){ t.T::~T(); }; class Test{ public:
sort(vector1.begin(), vector1.end());
~Test(){
// ok, works
std::cout<<“calling the destructor”<<endl;
vector1.sort();
}
// no sort member function
};
sort(list1.begin(), list1.end());
int main(){
// no, generic sort should not be used with list
int i;
list1.sort();
callDest(i);
// ok, works
// doesn’t issue compiler error // when t.T::~T() resolves to t.int::~int()
Nearly containers: STL is a generic library with components written for general-purpose use. But not all components are ‘truly generic’. There are three components in STL that are written for specific types/ purposes in mind. For example, bitset is not a generic set component—it is specifically designed to handle bit information. Such containers are referred to as ‘nearly containers’, and they are for specific purposes and not really generic in nature. The other two containers are valarray—designed specifically for numeric computations, and string, an alternative for null terminated C-style strings. Predicate objects: When a function object returns a Boolean value, it is referred to as a ‘predicate object’. In STL, contains many such predicate objects and they can be used in generic algorithms to change their behaviour. For example, the sort method implicitly takes the less predicate as the third argument (to sort the elements in ascending order). To change this default behaviour of sort, it is enough to pass some other predicate!
// this has no effect in the code generated. Test t; callDest(t); // results in calling the destructor explicitly // prints: // calling the destructor }
Hopefully, this introduction to Standard Template Library would help a novice understand its basics, in detail. We will cover further details on the same topic later. Till then, happy reading...
By: S.G. Ganesh is an engineer in Hewlett-Packard’s C++ compiler team. He has authored a book “Deep C” (ISBN 817656-501-6). He is also a member of the ANSI/ISO C++ Standardisation committee (JTC1/SC22/WG21), representing HP. You can reach him at [email protected].
www.linuxforu.com
CMYK
|
LINUX FOR YOU
|
MAY 2007
81