Section 2 - vectors

1B.2.1 The Fundamental STL Data Structure: the vector

Now that we have studied the creation of our own function templates, we want to show how to use someone else's templates -- in this case they will be class templates, similar in concept to function templates. We will be using type parameters like int, double or iTunesEntry to instantiate objects using the STL class templates. This is the crux of the STL library. The C++ library designers have provided us with a panoply of class templates that attempt to make the creation of abstract data types easier for us application programmers. As I said earlier, some of this course is going to involve analyzing the appropriateness of a C++ template class when compared against writing a simpler, smaller and faster one of our own. (By the way, notice that I sometimes say class template, and other times say template class. class template is correct, but give a guy a break.)

The first such class template to learn and evaluate is the vector class template. I'll give you the conclusion first, then you can read the analysis: C++ vectors provide a partial improvement over C++ arrays at a considerable performance cost. They are very useful -- and widely used -- in non-time critical applications, so it is absolutely necessary for you to be able to design with them. Sometimes we will use arrays, sometimes we will use vectors.

Furthermore, when we go on to more structured data types, we'll have a second decision to make. Should we use arrays as the foundation in the design or should we use STL vectors?

Very early in the course we will design our own vector class FHvector which is smaller and faster than the STL counterpart, and you may take that with you and use it if you want something that is, in a sense, between the array and the STL vector template in performance.

Sections 1B.2 to 1B.6 are Review from CS 2B (CIS 15B)

1B.2.2 Evaluating Arrays

Let's consider some of the advantages and disadvantages of arrays.

Why We Love Arrays

Arrays are easy to use. You can define an array of any data type using a simple declaration and then access their elements using simple bracket notation:

   Card deck[52];

   deck[5].set('2', clubs);

Arrays are fast. They are as fast a method of accessing one of a collection of objects as you will find. The difference between one microsecond and five microseconds might seem meaningless on ordinary time scales, but if you have to update two million pixels on the screen in 1/60 of a second, this difference will be painfully evident. Speed is important in some applications.

Why We Hate Arrays

The trouble with arrays can be uttered in a single breath (with oxygen left over): "fixed size, no protection".

By fixed size, I mean that once you declare an array, that's it. No changing. You might allocate way more than you end up using, or not enough. It doesn't matter if you used bracketed arrays or dynamic arrays. Either way, changing the number of elements is difficult or impossible. The vector container in the standard template library will solve this problem.

There is no protection from an out-of-bounds array index with arrays, either at compile time or at run time. If you try to do this:

   deck[1000].set('3', diamonds);

You might not get a run time error until very far from this statement. The consequence of writing beyond the end of the array is unknowable because it depends upon what data the compiler and run-time routines store there. Something is sure to be trashed, but we can't know what until the program runs a while and by then, it's too late.

The vector container in the standard template library will solve the problem of size flexibility nicely. However, the vector does not absolutely address the problem of bounds checking. There is dubious bounds protection in a vector method (dubious, because it requires that you look for an out_of_range exception rather than providing some softer protection) called "at()". And there is no protection at all in the overloaded brackets operator, []. There are other solutions that allow us to use brackets, [], and get protection, but these are not simple vectors. As regards speed, some simple tests show vectors to be two times as slow as arrays when using the [] operator and potentially slower when using the at() method. These results will vary, depending on platform, optimization settings of your compiler, and state of the computer when the program is run. In other words, you need to be aware that vectors may have a performance cost. Nevertheless, we will get a lot of help from them. Let's see.

1B.2.3 Vectors

C++ gives us some container class templates that help us create collections of objects like arrays. The container that is most like an array -- but with some advantages over it -- is a vector. In order to use vectors you have to declare one in a manner similar to an array declaration, but with different syntax. Let's say you want a vector of 100 doubles. You would declare:

   vector<double> myDoubles(100);

Once declared, you can use myDoubles much like an ordinary array with some added helper methods. One method, size(), returns the number of elements currently in the vector. Another method, push_back(), lets you expand the vector by adding a new element to the end (push a new element onto the back of the array, thus the name push_back). The great thing about vectors is that even though they are expandable, we don't have to do any de-allocation on them as we would if we were trying to use dynamic arrays.

Here is a simple example of the use of vectors. Notice that you have to #include <vector> at the top of your file to gain access to these tools.

#include <iostream>
#include <vector>
using namespace std;

int main()
{
   vector<double> myDoubles(100);
   int k;

   for ( k = 0; k < myDoubles.size(); k++ )
      myDoubles[k] = -.01 * k;

   for ( k = 0; k < myDoubles.size(); k++ )
      cout << myDoubles[k] << " ";

   cout << endl;

   // push a new double onto the back of the vector
   myDoubles.push_back(55.99);
   cout << myDoubles[100] << endl << endl;

   return 0;
}

The output is:

console shot

We have overcome our fixed-size problem. We can even declare a vector with zero elements and simply use push_back() to add them as needed.

int main()
{
   vector<double> myDoubles;
   int k;

   for ( k = 0; k < 100 ; k++ )
      myDoubles.push_back(-.01 * k);
   myDoubles.push_back(55.99);
 
   for ( k = 0; k < myDoubles.size(); k++ )
      cout << myDoubles[k] << " ";
   cout << endl;

   return 0;
}

Run

console shot

1B.2.4 Vector Methods

When you declare a vector of doubles, as we did above, you are actually creating a new class whose name is vector<double>. In other words, vector is not a class by itself (it is a template), but the line vector<double> myDoubles does generate a class at compile time. Since vector<double> is a class, it has methods - these are methods that the vector template imbues on all vector classes which you create from the vector template. Here are some of the more common instance methods bestowed on all vector classes:

back() returns the value of the end (back) of the vector.
pop_back() removes the last element from the vector (but does not return it). This is the opposite of push_back() because it compresses the array rather than expanding it.
empty() returns a bool true or false, depending on whether or not the vector is empty.
at() is used as an alternative to ordinary [] notation to access elements in the vector. It provides some bounds protection but is often slower than the [] access.
A complete list of vector methods can be found at http://www.cplusplus.com/reference/stl/vector/

Watch how these can be used together in a nice loop without knowing the size of the array, and without using an int looping index:

int main()
{
   vector<double> myDoubles;

   for ( int k = 0; k < 100 ; k++ )
      myDoubles.push_back(-.01 * k);
   myDoubles.push_back(55.99);
 
   while ( !myDoubles.empty() )
   {
      cout << myDoubles.back() << " ";
      myDoubles.pop_back();
   }
   cout << endl;
   return 0;
}

Output

1B.2.5 Vectors of Objects

As we implied above, we are not restricted to creating vectors of primitives like ints or doubles. We can make a vector from any class, even those that we have defined ourselves. Thus, we can create a vector of Cards (vector<Card> myCards(100);) or vector of strings (vector<string> my_strings(999);)

There is a catch, however. If you create your own class and supply a default constructor for this class, you should be aware that the line:

 vector<Card> myCards(100);

does not, as you might expect, call a Card constructor 100 times. This can be a problem if you need your constructor (and destructor) to be reliably called a fixed number of times as predicted by your code. The truth is, you have no control over how often constructors and destructors are called when you use vectors. Calling push_back() can generate 1, 100 or 1000 constructor and destructor calls, including a mix of default and copy constructors! Since vectors do a lot of extra allocation and de-allocation behind the scenes, there will be many more constructor calls than you might want. Even worse, where you would expect a default constructor to be invoked, you may end up with a copy constructor or some combination of both. All this means that before you decide to use vectors of objects instead of arrays of objects, make sure that you don't care about extra, unpredictable, constructor and destructor calls. This is highly relevant with classes that have deep (dynamic) memory or classes whose constructors have "intended side effects" such as management of static members or calls to other class methods.

Here is a short example program that demonstrates the problem with vectors of objects.

// Demo of vector of objects and how constructors are called
#include <iostream>
#include <vector>
using namespace std;

class Hal
{
public:
   // output in const/dest only for diagnosis - normally not allowed 
   Hal() { cout << "DEFAULT CTOR\n"; }
   Hal(const Hal &h) { cout << "Copy CTOR\n"; }
   ~Hal() { cout << "Destructor\n"; }
};

typedef class vector<Hal> HAL_TYPE;

int main()
{
   // one extra default constructor and one default destructor are called to
   // manufacture the array of 5 objects.  The 5 objects are then created
   // using 5 copy constructors.  Watch
   cout << "Starting vector instantiation --------\n";
   HAL_TYPE hArray(5);
   cout << "Done with vector instantiation. Now 3 simple objects --------\n";

   Hal h1, h2, h3;

   // Notice how array is excessively and inconsistently re-allocated 
   // for each push_back:

   cout << "\nFirst push_back() ---------\n";
   hArray.push_back( h1 );
   cout << "\nSecond push_back() ---------\n";
   hArray.push_back( h2 );
   cout << "\nThird push_back() ---------\n";
   hArray.push_back( h3 );
   cout << "\nDone.  Now for clean-up destructors ---------\n";
}

The Output

Starting vector instantiation --------
DEFAULT CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Destructor
Done with vector instantiation. Now 3 simple objects --------
DEFAULT CTOR
DEFAULT CTOR
DEFAULT CTOR

First push_back() ---------
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Destructor
Destructor
Destructor
Destructor
Destructor

Second push_back() ---------
Copy CTOR

Third push_back() ---------
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Copy CTOR
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor

Done.  Now for clean-up destructors ---------
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Destructor
Press any key to continue . . .

The next time you run into a expert programmer who says, "arrays are evil, always use vectors", show them this example. vectors, like arrays, are human.

1B.2.6 at(k)

The so-called bounds protection of a vector is provided by the at() method, an alternative to the bracket notation (which provides none). at() throws an exception when bounds are exceeded. In order to use this, you have to catch the exception in your client. Here we force an exception to demonstrate. Try this out.

   try
   {
      myDoubles.at(-2) = 12;
   }
   catch ( exception& e)
   {
      cout << "oops" << e.what() <<  endl;
   }

I will repeat what I said about at() so you we are clear: vectors are often slower than arrays when you use the at() method to index them (in some compilers with certain optimizations options set). The bounds protection does not come for free.