Section 1 - Analyzing Binary Search

3B.1.1 Recursion

In both CS 2A (CIS 15A) and CS 2B (CIS 15B) we discussed recursion and some of its key properties. Even with the admonitions I give in those classes about using it as sparingly and rarely as possible, I know some inexperienced programmers overdo it. Finally, in CS 2C we get to use recursion in places where it belongs, and also evaluate, with our new tools, how bad it really can be if we use it frivolously.

The headlines on recursion are:

  1. It is overused and should be avoided, if that can be done without too much effort.
  2. When used correctly, it can result in logarithmic running times, which is as good as it gets.
  3. When used incorrectly it can result in exponential running times, which is as bad as it gets.
  4. Sometimes a recursive solution is implemented as a stage in development, and once working, it is converted to a non-recursive solution.

3B.1.2 A Simple Linear Search - O(N)

We begin by starting with a standard linear search.

template <typename Comparable>
int linearSearch(Comparable array[], int size, const Comparable &searchItem)
{
   for (int k = 0; k < size; k++)
      if (searchItem == array[k])
         return k;   //found him!
   return -1;
}

There is nothing fancy here other than a template function that assumes the underlying Comparable type has an == operator defined. We can see from a quick observation that this is clearly O(N), while the actual running time will be faster if search_item is found early in the list. A short test confirms that as N grows large we do have linear behavior:

// --------------- main ---------------
int main()
{
   int k, arraySize, searchItem, loc;
   double elapsedTime = 0;
   int *arrayOfInts = NULL;
   srand ( time(NULL) );
   clock_t startTime, stopTime;

   for (arraySize = 100; (elapsedTime < 2) && (arraySize < INT_MAX/4)
      && (arraySize > 0) ; arraySize*=2)
   {
      // allocate a pre-sorted array
      arrayOfInts = new int[arraySize];
      for (k = 0; k < arraySize; k++)
         arrayOfInts[k] = k;

      searchItem = rand()/(double)RAND_MAX * (double)arraySize;

      startTime = clock();    // start timing ---------------
      loc = linearSearch(arrayOfInts, arraySize -1, searchItem);
      stopTime = clock();      // stop timing ------------------------
      
      elapsedTime = (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC;
      cout << "Found item " << searchItem << " at location " << loc << endl;
      cout << "N: " << arraySize << ", Search Time: " 
         << elapsedTime  << " seconds" << endl << endl;

      // release memory
      delete[] arrayOfInts;
   }
   return 0; 
}

In the sample run below we continued until the computer ran out of memory. The linearity is evident in the largest runs at the bottom. Each time we double the array size we expect the search times, on average, to double, and we see that, adjusting for small variations due to operating system resources, it does. The reason for the unexpected drop in time at the largest array is probably due to the item being close to the start of the array.

[smaller arrays undetectable times (= 0)]

Found item 1399242 at location 1399242
N: 13107200, Search Time: 0 seconds

Found item 1429643 at location 1429643
N: 26214400, Search Time: 0 seconds

Found item 30405727 at location 30405727
N: 52428800, Search Time: 0.015 seconds

Found item 76047120 at location 76047120
N: 104857600, Search Time: 0.047 seconds

Found item 29767308 at location 29767308
N: 209715200, Search Time: 0.015 seconds

Press any key to continue . . .

3B.1.3 A Binary Search - O(?)

We also studied binary searches in CS 2B. The algorithm is shown below (and in the text) if you need a review. We will add it to the Search class as a sibling/alternative to the linearSearch() generic function. In short, we assume a sorted array to begin with and, in our search, eliminate half the array on every comparison. This gives a very fast algorithm. How fast? First let's time it, then we'll discuss it. Here is the binary search template function (remember the array must be pre-sorted):

template <typename Comparable>
int binarySearch(Comparable array[], const Comparable &searchItem,
      int firstIndex, int lastIndex)
 {
    int middleIndex, result;
    
    if (firstIndex > lastIndex)
       return -1;
    middleIndex = (firstIndex + lastIndex) / 2;
    
    if (searchItem < array[middleIndex])
       return binarySearch(array, searchItem, firstIndex, 
          middleIndex-1);
    else if (searchItem == array[middleIndex])
       return middleIndex;   //found him!
    else
       return binarySearch(array, searchItem, middleIndex+1, 
          lastIndex);
 }

By the way, it is not unreasonable to require a sorted array in a real application. There are many data sets which live in a sorted format either because items are inserted into the list in a sorted manner or the data is static or very slowly changing (an e-phone book) compared to how often a search takes place.

We replace the linear search invocation with a binary search invocation:

      loc = binarySearch(arrayOfInts, searchItem, 0, arraySize - 1);

We already know that this is much faster, but let's run it to be sure:

/* ------------------------- BINARY SEARCH RUN ------------

... last few searches

Found item 4911749 at location 4911749
N: 6553600, Search Time: 0 seconds

Found item 10177510 at location 10177510
N: 13107200, Search Time: 0 seconds

Found item 1248838 at location 1248838
N: 26214400, Search Time: 0 seconds

Found item 8968273 at location 8968273
N: 52428800, Search Time: 0 seconds

Found item 53377628 at location 53377628
N: 104857600, Search Time: 0 seconds

Found item 65710805 at location 65710805
N: 209715200, Search Time: 0 seconds

Found item 49102298 at location 49102298
N: 419430400, Search Time: 0 seconds

Press any key to continue . . .

-------------------------- END RUN -------------------------- */

Clearly, since we cannot detect any time, even for arrays of size 500 million, this is fast. As you may have guessed, the running-time is logarithmic. Logarithmic is so fast that in some situations, like this, it can't be distinguished from constant time, but it is decidedly not as fast as constant time. Still, we're impressed with the speed.

So what, exactly, does logarithmic mean? Next page, please.