Section 2 - STL-Style Algorithm Functions

8A.2.1 STL Algorithms

There are a class of template functions in C++ called algorithms. These are defined in <algorithm> (the #include file alrorithm.h). They all have the same basic form in the simplest invocations:

   some_c++_algorithm( beginningIterator, endIterator);

These iterators can be any variety that STL admits, like those in a vector or list template. (They will often work with non-STL iterator-like arguments, i.e., things that behave like a pointers and support ++ and -- operators, such as FHvectors, for instance.) An example is the sort() algorithm:

#include <algorithm>
#include <vector>

   vector<int> stlVectorOfInts;

   // load them with values, then ...

   sort(stlVectorOfInts.begin(), stlVectorOfInts.end());

This is a very common way to use STL algorithms. Often, there is a third parameter, a comparator functor, which you can pass as a parameter and which gets used in place of the default < operator, but we won't go into that now. The important thing is that all of these algorithms take ranges of items, never vectors or lists. The ranges can come from a vector or list, but they are still represented by pairs of iterators.

You may find the need to supply template functions, especially sort functions, that accept iterators, rather than the template sort functions we saw last section that take arrays or vectors. To that end, we'll talk about how that can be done.

Let me remind you where North, South, East and West are. Although I just told you that there is a sort() algorithm in STL's algorithms library, we will continue our discussion on writing and experimenting with our own insertionSort() function, and others of their ilk. However, we wish to allow the client to call such functions using the same protocol that C++ programmers use for the STL sort().

8A.2.2 Providing An Iterator Driver

We would like to provide an overloaded version of the insertionSort() algorithm that takes two iterators, as in:

#include "FHvector.h"
#include "FHsort.h"

   FHvector<int> fhVectorOfInts;

   // load them with values, then ...

   insertionSort(fhVectorOfInts.begin(), fhVectorOfInts.end());

We have a small problem, though. Look at our original template function:

template <typename Comparable>
void insertionSort( FHvector<Comparable> & a )
{
   int k, pos, arraySize;
   Comparable tmp;
    
   // etc ...
}

One of the key variables is of type Comparable. But how can we define an object of type Comparable (or any object type parameter) if all we have is a type parameter which is an iterator? It's not so much what we call it (Comparable, Iterator, etc.) but how we intend to use it. We will invoke this function with iterators, not objects of our data type.

Remember, unlike template classes, template functions do not admit the passing of a type parameter. We can't pass an angled brackets <int> or <iTunesEntry> to a template function. Instead, we pass an object of type int or of type iTunesEntry, and this object communicates the type. However, if there is no such object parameter (there is not when you pass a pair of iterators) you cannot get this object information.

So, if we try to define a template function that takes two iterators, this is how the definition starts:

template <typename Iterator>
void insertionSort(Iterator begin, Iterator end)
{
   // there is no type parameter I can use to declare an Object 
   // pointed to by the Iterators
}

Even though the client does not place a type parameter in angled brackets when invoking a template function (look up at the recent examples, and you'll see what I mean), the definition of the template function makes it appear as though a type argument will be passed down. The production of a type name from an object is done behind-the-scenes for us by C++ when a template function is called. It turns the argument object into a type that we can use in the definition. To say the same thing I said twice already, the new, proposed version of insertionSort() takes iterators , and no Objects, so we don't know what to call the data type the iterator points to.

8A.2.3 Two Rejected Proposals

Because it is good that you learn advanced C++ along with data structures, I will dwell on this a bit. Therefore, before demonstrating the ultimate solution, let's consider two failed attempts at this problem. They work, but they surrender the original goal and do not try to make this a template function that takes two iterator types.

Proposal One: A Third Wheel

The first idea is a direct reaction to the reason that we do not have an object type parameter. It adds a third parameter of type Object or Comparator, and uses that to communicate the data type. Let me warn you: this is ugly - it requires the client pass a seemingly useless third parameter.

template <typename Iterator, typename Comparable>
void insertionSort(Iterator begin, Iterator end, Comparable x)
{
    Iterator pos, k;
    Comparable tmp;
    
    for(pos = begin; pos != end; pos++ )
    {
        tmp = *pos;
        for(k = pos; k != begin && tmp < *(k-1); k-- )
            *k = *(k-1);
        *k = tmp;
    }
}

So what do we have? A function that takes two iterators and an object. Yes, we pass begin and end, but also an object, x, that we do nothing with, just so we can access the type parameter Comparable. Here is are two possible invocations, one silly, and the second a little less silly:

insertionSort(fhVectorOfInts.begin(), fhVectorOfInts.end(), 94022);

For exactly zero points extra credit, how many things can you tell me about what 94022 is doing there, and what its significance is?

Here is an invocation of the same function template that is a little less arbitrary than the previous:

insertionSort(fhVectorOfInts.begin(), fhVectorOfInts.end(), 
   *fhVectorOfInts.begin());

At least now we are not pulling random integers from thin air. We are dereferencing the return iterator that begin() gives us as a means of manufacturing an object of the underlying data type. Still, who wants to pass an object for no reason? Proposal one rejected.

Proposal Two: A Functor-Like Solution

Another way around the dilemma is to find a construct that does allow us to pass a type parameter like <int> or <iTunesEntry> to a template, namely a template class. And since we really want the class so we can write a function, what does this remind us of? What classes are designed for the purpose of defining only one function? Answer: a function object, or functor. Recall, this is a class whose only member is a public overloading of the () operator. Therefore we try that, putting the sort into the () operator:

template <class Comparable>
class insertionSortOfType
{
public:
   template <typename Iterator>
   void operator()(Iterator begin, Iterator end)
   {
      Iterator pos, k;
      Comparable tmp;

      for(pos = begin; pos != end; pos++ )
      {
         tmp = *pos;
         for(k = pos; k != begin && tmp < *(k-1); k-- )
            *k = *(k-1);
         *k = tmp;
      }
   }
};

Not pretty, but it's not the definition that matters, how does the client use it?

insertionSortOfType<int> isort;
isort(fhVectorOfInts.begin(), fhVectorOfInts.end());

The actual call is not bad - it meets our requirements of taking two iterators. The problem is the need to instantiate an object of the class first, before using the function. Okay, since we are spitballing ideas, let's try to kick this can a little farther down the road. Can we avoid the intermediate step by declaring a temporary insertionSortOfType object and calling the sort function, all at once? Yes, but then we change the member method from an operator to a constructor. It looks like this in the definition:

template <class Comparable>
class insertionSortOfType
{
public:
   template <typename Iterator>
   insertionSortOfType(Iterator begin, Iterator end)
   {
      Iterator pos, k;
      Comparable tmp;

      for(pos = begin; pos != end; pos++ )
      {
         tmp = *pos;
         for(k = pos; k != begin && tmp < *(k-1); k-- )
            *k = *(k-1);
         *k = tmp;
      }
   }
};

Once you have a constructor, you can use it to automatically instantiate an object for immediate use, with no variable associated, using the automatic anonymous mechanism. Here is the client side:

insertionSortOfType<int>(fhVectorOfInts.begin(), fhVectorOfInts.end());

This is the best so far. It is a single call that takes two iterator parameters and needs no introduction. Still, it requires that we pass the type, int in this case, as a type parameter, and our wish is to emulate the way that STL algorithms work, in which there is no such requirement. Nice try. Proposal two rejected!

Don't feel bad, and don't ignore these attempts. They represent real C++.

8A.2.4 The Correct Two-Iterator Solution

The actual solution we are led to use comes from the second example in proposal one. We had the idea to pass a third parameter of our data type, and we manufactured it by dereferencing one of the iterators. Here is the client view, once again:

insertionSort(fhVectorOfInts.begin(), fhVectorOfInts.end(), 
   *fhVectorOfInts.begin());

The plan is to write this "ugly" version, but not force the client to use it. Instead, we will write an overloaded variant that takes two iterators from the client (and nothing else) and then from the first iterator, manufacture the third parameter using the same technique: by dereferencing it. So we write two function, a driver that the client will use, and a non-publicized (but still public, since this is not a class) ugly version. Here is the definition side:

// ugly version
template <typename Iterator, typename Comparable>
void insertionSort(Iterator begin, Iterator end, Comparable x)
{
    Iterator pos, k;
    Comparable tmp;
    
    for(pos = begin; pos != end; pos++ )
    {
        tmp = *pos;
        for(k = pos; k != begin && tmp < *(k-1); k-- )
            *k = *(k-1);
        *k = tmp;
    }
}

// version client will use
template <typename Iterator>
void insertionSort(Iterator begin, Iterator end)
{
   if (begin != end)
      insertionSort(begin, end, *begin);
}

The first function is the one our client uses. The second is exactly the same as our rejected proposal, but since our client doesn't actually use it, no harm, no foul. If you don't like the idea that the client might try to use it, you could name the second function something hard to guess. Here is the client side:

insertionSort( fhVectorOfInts.begin(), fhVectorOfInts.end() );

That's exactly what we wanted. It took a lot of back-room dealing, but we made it happen. By the way, you can also use STL iterators in our insertionSort():

insertionSort(stlVectorOfInts.begin(), stlVectorOfInts.end());

This is a good thing to know, because it gives us an example of how to write an algorithm of our own that takes STL iterators.

If we decide to pass STL iterators into our insertionSort() method and the size of the vector spanned by those two iterators is large, it will take thousands of times longer than our FHvector if optimization is not turned on. Even with optimization, it is not great. The slowness of STL vectors compared with those that we produced ourselves, becomes magnified by the N² running time of this somewhat slow sort.

8A.2.5 A Sample Client

Here is a client that will show some of the options we just demonstrated. It makes use of the bubble sort in FoothillSort.h and the insertionSort in FHsort.h. It also does a benchmark of bubble vs. insertion vs. the STL sort algorithm.

// Comparison of 3 sorts: array-bubble and FHvector-insert and STL sort
#include <iostream>
using namespace std;
#include "Foothill_Sort.h"
#include "FHsort.h"
#include "FHvector.h"
#include <time.h>
#include <algorithm>
#include <vector>

// --------------- main ---------------
#define ARRAY_SIZE 30000

int main()
{
   int k;
   int arrayOfInts[ARRAY_SIZE];
   FHvector<int> fhVectorOfInts;
   vector<int> stlVectorOfInts;

   // build an array and two vector for comparing sorts
   for (k = 0; k < ARRAY_SIZE; k++)
   {
      arrayOfInts[k] = rand()%ARRAY_SIZE;
      fhVectorOfInts.push_back(arrayOfInts[k]);
      stlVectorOfInts.push_back(arrayOfInts[k]);
   }

   clock_t startTime, stopTime;
   startTime = clock();  // ------------------ start 

   // sort using a home made bubble sort (in Foothill_Sort.h)
   arraySort(arrayOfInts, ARRAY_SIZE);

   stopTime = clock();  // ---------------------- stop
   cout << "\nArray Bubble Sort Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   startTime = clock();  // ------------------ start 

   // sort using insertionSort (in FHsort.h)
   insertionSort(fhVectorOfInts.begin(), fhVectorOfInts.end() ); 
 
   stopTime = clock();  // ---------------------- stop
   cout << "\nFH Vector Insertion Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   startTime = clock();  // ------------------ start 

   // a pure stl sort - STL's best effort on its own vector types
   sort(stlVectorOfInts.begin(), stlVectorOfInts.end());

   // don't try this on large arrays or you'll be waiting a long time
   // insertionSort(stlVectorOfInts.begin(), stlVectorOfInts.end());

   stopTime = clock();  // ---------------------- stop
   cout << "\nSTL sort Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   // confirming that we got the correct results on all three arrays
   for (k = 0; k < ARRAY_SIZE; k+= ARRAY_SIZE/10)
   {
      cout << "bubble #" << k << ": " << arrayOfInts[k] << ", ";
      cout << "Fhsort #" << k << ": " << fhVectorOfInts[k] << ", ";
      cout << "STL sort #" << k << ": " << stlVectorOfInts[k] << endl;
   }
   return 0; 
}


/* --------- output -----------

Array Bubble Sort Elapsed Time: 1.137 seconds.


FH Vector Insertion Elapsed Time: 0.126 seconds.


STL sort Elapsed Time: 0.002 seconds.

bubble #0: 0, Fhsort #0: 0, STL sort #0: 0
bubble #3000: 1623, Fhsort #3000: 1623, STL sort #3000: 1623
bubble #6000: 3756, Fhsort #6000: 3756, STL sort #6000: 3756
bubble #9000: 7054, Fhsort #9000: 7054, STL sort #9000: 7054
bubble #12000: 10394, Fhsort #12000: 10394, STL sort #12000: 10394
bubble #15000: 13691, Fhsort #15000: 13691, STL sort #15000: 13691
bubble #18000: 17026, Fhsort #18000: 17026, STL sort #18000: 17026
bubble #21000: 20234, Fhsort #21000: 20234, STL sort #21000: 20234
bubble #24000: 23445, Fhsort #24000: 23445, STL sort #24000: 23445
bubble #27000: 26740, Fhsort #27000: 26740, STL sort #27000: 26740
Press any key to continue . . .
----------------------------- */

Let's try it on a Unix system that has a faster STL sort:

Array Bubble Sort Elapsed Time: 4.36707 seconds.


FH Vector Insertion Elapsed Time: 0.698195 seconds.


STL sort Elapsed Time: 0.007416 seconds.

bubble #0: 0, Fhsort #0: 0, STL sort #0: 0
bubble #3000: 2989, Fhsort #3000: 2989, STL sort #3000: 2989
bubble #6000: 5983, Fhsort #6000: 5983, STL sort #6000: 5983
bubble #9000: 8998, Fhsort #9000: 8998, STL sort #9000: 8998
bubble #12000: 12031, Fhsort #12000: 12031, STL sort #12000: 12031
bubble #15000: 15149, Fhsort #15000: 15149, STL sort #15000: 15149
bubble #18000: 18088, Fhsort #18000: 18088, STL sort #18000: 18088
bubble #21000: 21097, Fhsort #21000: 21097, STL sort #21000: 21097
bubble #24000: 24032, Fhsort #24000: 24032, STL sort #24000: 24032
bubble #27000: 27013, Fhsort #27000: 27013, STL sort #27000: 27013

STL sort beat my insertionSort (the version that takes iterator parameters) by a whopping 100:1 ratio on Unix, and 63:1 on Windows. One reason is that the STL sort is using the best algorithm it can for a randomly ordered array. We still have some work to do. However, our insertionSort still has a role to play, as we'll see.