Section 4 - Benchmarking STL vector
1B.4.1 Comparing Arrays with STL vectors
Before going on to create the interesting and exotic data structures of this course, we indulge ourselves by spending a little more time to play around with the basic array and vector linear structures (linear because they are both 1-dimensional "lines" of data, conceptually). We have a full 11 weeks to explore so let's savor every morsel.
So far, this week, we have
- reviewed and used arrays,
- reviewed and used vectors, and
- presented a technique for timing algorithms.
We now put it all together by preparing a benchmark where we compare, directly, an array solution of a simple task with a vector solution. The assignment? A problem from our first programming course: compute the average of an array (or vector) of ints.
1B.4.2 Coding for Arrays
We will declare a linear structure of 200000 ints, but this number can be easily changed because it will be #define-d as such a value should always be:
#define ARRAY_SIZE 200000 // later, in client int arrayOfInts[ARRAY_SIZE];
Let's load the arrays with random ints from 0 to 99:
for (k = 0; k < ARRAY_SIZE; k++) arrayOfInts[k] = rand()%100;
Then, we compute the average:
for (k = 0, avg = 0.; k < ARRAY_SIZE; k++) avg += arrayOfInts[k]; avg/=ARRAY_SIZE;
Finally, report the result:
cout << avg << endl;
This will give us a mix of reading and writing to the data in equal amounts. Here is the whole section of code, along with the timing lines:
startTime = startClock(); // START TIME START TIME START TIME ---------- for (k = 0; k < ARRAY_SIZE; k++) arrayOfInts[k] = rand()%100; // compute the average using array for (k = 0, avg = 0.; k < ARRAY_SIZE; k++) avg += arrayOfInts[k]; avg/=ARRAY_SIZE; cout << avg << endl; elapsedTime = computeElapsedTime(startTime); // STOP TIME STOP TIME STOP TIME ----- displaySeconds("Time for simple array: ", elapsedTime);
This will be one of three blocks we put into our benchmark program.
1B.4.3 Coding for vectors
We'll do the same thing for vectors, using the [] notation. Remember, [] is faster than at() since it has no bounds checking. First, the declaration:
vector<int> vectorOfInts(ARRAY_SIZE);
Next, the block of code that does the same as that for arrays, only using vector notation:
startTime = startClock(); // START TIME START TIME START TIME ---------- for (k = 0; k < ARRAY_SIZE; k++) vectorOfInts[k] = rand()%100; // compute the average using array for (k = 0, avg = 0.; k < ARRAY_SIZE; k++) avg += vectorOfInts[k]; avg/=ARRAY_SIZE; cout << avg << endl; elapsedTime = computeElapsedTime(startTime); // STOP TIME STOP TIME STOP TIME ----- displaySeconds("Time for vector: ", elapsedTime);
We'll go on and do two more comparisons:
- We use the at() function to see how it affects timing compared with [].
- We build the array using push_back() rather than initializing the size up-front.
I won't show the actual code for those two here -- they are very similar, and the entire program, including the timing methods, is on the next page, so you can study it there. We are eager to look at the results and discuss them.
1B.4.4 Results
I ran the program several times on a desktop Windows system (200,000 elements), compiled with optimization for speed. Here are three runs:
First Run --------------- 49.4947 Elapsed time for Time for simple array: 0.005 seconds. 49.5702 Elapsed time for Vector using []: 0.005 seconds. 49.4241 Elapsed time for Vector using at(): 0.004 seconds. 49.4088 Elapsed time for Vector using push_back():0.01 seconds. Press any key to continue . . . Second Run --------------- 49.4947
Elapsed time for Time for simple array: 0.005 seconds.
49.5702
Elapsed time for Vector using []: 0.005 seconds.
49.4241
Elapsed time for Vector using at(): 0.005 seconds.
49.4088
Elapsed time for Vector using push_back():0.011 seconds.
Press any key to continue . . . Third Run --------------- 49.4947
Elapsed time for Time for simple array: 0.005 seconds.
49.5702
Elapsed time for Vector using []: 0.005 seconds.
49.4241
Elapsed time for Vector using at(): 0.004 seconds.
49.4088
Elapsed time for Vector using push_back():0.011 seconds.
Press any key to continue . . .
The results show no observable difference among the sample runs, except that push_back() is decidedly slower. I tried it on a Mac/Linux system, once under Eclipse and once under Xcode:
Eclipse/Mac --------------------------------- 49.5202 Elapsed time for Time for simple array: 0.003356 seconds. 49.447 Elapsed time for Vector using []: 0.004002 seconds. 49.4526 Elapsed time for Vector using at(): 0.00892 seconds. 49.4209 Elapsed time for Vector using push_back():0.008319 seconds. Xcode/Mac --------------------------------- 49.5202 Elapsed time for Time for simple array: 0.003442 seconds. 49.447 Elapsed time for Vector using []: 0.003171 seconds. 49.4526 Elapsed time for Vector using at(): 0.004571 seconds. 49.4209 Elapsed time for Vector using push_back():0.009441 seconds.
We have vectors about as fast as arrays using []. When using at(), vectors seem to be slightly slower than arrays. push_back() seems to be about 2- 3 times longer than array accesses. So this borders on being problematic. Furthermore, the push_back() is only used on the writing - we are still reading using [] in that trial, which means it probably is worse than these results show; as you'll see in the code on the next page, the read operation using [] dilute, and thus speed up, the push_back() component
The difference between push_back() and [] is that push_back() is a write operation. So, this relates to our earlier introduction when we asked the question, "what's the dominant operation?" If it is reading, not writing, then vectors and push_back() are fine. This allows allows the vector to grow. We use push_back() to build and we are happy with vector [] accesses since they are about as good as arrays. If the dominant operation is write, though, we may need to use arrays or allocate a fixed size vector up-front and use [] for everything.
You can't say with certainty what to use because you have to know what the actual application is to converse about it intelligently. But you start to see how the discussion unfolds. We have a little basic knowledge that we use to analyze any problem that employs a simple linear data structure where we are reading, writing and calculating. We know what questions to ask and we know how to do some experiments to arrive at an answer. Not bad for the first week in CS 2C, eh? Imagine how much more you'll know in 11 weeks.