Section 2 - Binary Searching

9B.2.1 Need for Efficiency

There is one problem with the linear search: it is not efficient. If we have an array of 1,000,000 students and we are searching for one, we'll have to compare, on average, 500,000 names with the key before we return with the position of the student. We may have to test all 1,000,000 (or maybe we get lucky and find it on the first test).

A binary search promises to find a student in one million names using at most 20 name/key comparisons! There's one catch: we have to sort the array first. This is not as bad as it sounds since we often store the array in sorted order anyway.

The concept is simple. We start out by comparing the key to the middle element of the (sorted) array. If the key is greater than the middle student, we know that key, if present in the array, is in the upper half of the array. On the other hand, if the key is less than the middle student, then the key, if present, is in the lower half sub-array. Either way, we take the appropriate (upper or lower) sub-array as the new array (it's now half as big as the original, thankfully) and repeat the test recursively. This gets repeated until we either find the key or the sub-array shrinks to nothing.

9B.2.2 A Binary Algorithm

Here is an example. First, we will simplify the Student class so that it contains only last names. Now, imagine the array contains 100 elements and we are looking for the key "chloe". Assume that "chloe" is in the array. We sort the array and assume that "chloe" ends up at index 41 after the sort.

Informally (meaning, in the syntax below, wherein we sloppily identify the name "chloe" with the object for which we are searching) we are going to inspect the expression:

"chloe".compare(array[k].lastName)

Since the array is sorted, we know that if k < 41 the comparison will return a positive number because "chloe" is greater than lastName. If k > 41, the result will be negative because "chloe" will be less than lastName. If we compare "chloe" to item 41 exactly, compare() will return a 0, indicating a perfect match. If we get a perfect match we return the index (41) to the client.

We now start our binary search. (Note: You do not have to memorize the following sequence. It is done automatically by the recursion. You only need to understand the principles).

We enter the method with the entire index range 0 to 99.
middleIndex = (0 + 99)/2 = 49 (we use int arithmetic)
"chloe" is less than the name in location 49, so we now search only the lower portion of the array, 0 to 48.
middleIndex = (0+48)/2 = 24
"chloe" is greater than the name in middleIndex, 24, so we search the sub-array 25 to 48.
middleIndex = (25+48)/2 = 36
"chloe" is greater than the name in middleIndex, 36, so we search the sub-array 37 to 48.
middleIndex = (37+48)/2 = 42
"chloe" is less than the name in middleIndex, 42, so we search the sub-array 37 to 41.
middleIndex = (37+41)/2 = 39
"chloe" is greater than the name in middleIndex, 39, so we search the sub-array 40 to 41.
middleIndex = (40+41)/2 = 40
"chloe" is greater than the name in middleIndex, 40, so we search the sub-array 41 to 41.
"chloe" is equal to the name in middleIndex, 41, so we return 41

Now the methods start "unwinding," returning from the last call back to the first, passing back the 41 as they make their way back up the recursive chain. This took us 7 comparisons compared to 41 if we had gone through the list linearly.

For any name key that we want to find in the array, we go through this process until we either find the student or the size of the sub-array we are searching collapses into nothing. In an array of 1 million students, this happens in, at most, 20 comparisons.

Here is the complete binary search algorithm as implemented in a static class method of Student:

int StudentArrayUtilities::binarySearch(Student array[], string keyLast,
int firstIndex, int lastIndex)
{
int middleIndex, result;

if (firstIndex > lastIndex)
return -1;
middleIndex = (firstIndex + lastIndex) / 2;
result = keyLast.compare( array[middleIndex].getLastName() );

if (result == 0)
return middleIndex;   //found him!
else if (result < 0)
return binarySearch( array, keyLast, firstIndex, middleIndex-1);
else
return binarySearch( array, keyLast, middleIndex+1, lastIndex);
}

Notice that we always call binarySearch() with a smaller array than we came in with, guaranteeing that we will either find the name, or that the first and last index will eventually cross, resulting in the escape condition which returns a -1.

The complete listing is in the next section.