Section 3 - shellSort

8A.3.1 shellSort From insertionSort

We now move on to a breakthrough in sorting time complexity, the shellSort (or shellSort as programmers call it). It is capitalized because the name comes from its inventor, Donald Shell. If you look directly at the shellSort code you will be confused at first, because of the three loops that are not immediately intuitive. However, if you sneak up on it, as we will do, it will make sense.

You must understand the insertionSort before we start, so if you are insecure about your understanding that, don't bother reading here, but go back to the first page of this module where the diagrams help explain the logic behind it. I will repeat the code here. Without rushing, review this code, following the increasing outer loop and decreasing inner loop. After you have fully reviewed and digested it, read on.

template <typename Comparable>
void insertionSort( FHvector<Comparable> & a )
{
   int k, pos, arraySize;
   Comparable tmp;
    
   arraySize = a.size();
   for(pos = 1; pos < arraySize; pos++ )
   {
      tmp = a[pos];
      for(k = pos; k > 0 && tmp < a[k-1]; k-- )
         a[k] = a[k-1];
      a[k] = tmp;
   }
}

To answer the question, "what makes this slow", we always come back to the same issue: in the worst case we are stepping through two loops one-item-at-a-time. If the array is heavily un-sorted to begin with, then the inner loop, as it searches for the right place in the left portion of the array to insert the next item, very conservatively moves each "larger" element to the right by only one location. These larger elements have to travel a long way, and they only move up one spot every loop pass. The idea is that we would like to move these elements more quickly. If the array has 10,000 elements, for example, and the largest starts in position 0, wouldn't it be better to move it in one assignment statement directly to its final location, position 9999? Even if that could not be arranged, certainly something like five copies that advance it 2000 positions each time would be a vast improvement (compared with the 9999 assignments that the insertionSort requires to get it into position). You don't even have to scrutinize the insertionSort too intensely to understand that it takes 9999 assignments to get this element to the top, because we never move an element of the array up unless we do it with the assignment statement:

   a[k] = a[k - 1];

which is an advancement of one position. So our goal will be to try to move things into position using large strides rather than baby steps. We will try to get an assignment statement that looks more like this working on our behalf:

   a[k] = a[k - LARGE_NUM];

That simple observation alone will serve to help you see where we are going.

8A.3.2 More Loops, Less Time

The funny thing about the shellSort is that it contains three loops, rather than insertionSort's two. And in the very last pass of the outermost loop the two inner loops are exactly the insertionSort! Think about that. We are going to do something a bunch of times, and the very last time we do it, it will be exactly the same as the insertionSort. How can this make things faster?

   for (some outer loop condition)
      [SOMETHING THAT REDUCES TO AN insertionSort ON THE FINAL PASS]

Aren't we adding a bunch of unnecessary outer loop passes. Just throw them all away since we are going to do an entire insertionSort on the final pass of the outer loop, anyway. We already know that the final pass is a complete sort algorithm and can take care of the array all by itself in a single application. We are wasting our time with all these other unnecessary outer passes!

Ah, but remember: if the list is sorted - or nearly sorted - insertionSort has O(N) time complexity. This means that if we can do a little work "on the front end" to tidy up the array, we might be able to make this final "insertionSort" pass closer to O(N) than O(N²). This will work if the extra time it takes up-front to accomplish this nearly-sorted state of affairs is not too strenuous.

That's the plan - to create a double inner-loop that behaves almost like an insertionSort, and in fact, behaves exactly like it on the final pass of an outer third loop. Let's start by rewriting the insertionSort replacing the literal 1 with a variable that we set = 1 at the start. Because I want to go slowly I will call the variable ONE to remind us that it was 1 in the original insertionSort:

   int ONE = 1;
   
   for(pos = ONE ; pos < arraySize; pos++ )
   {
      tmp = a[pos];
      for(k = pos; k >= ONE && tmp < a[k - ONE]; k -= ONE )
         a[k] = a[k - ONE];
      a[k] = tmp;
   }

So far we have not changed a single lepton of the insertionSort. The only difference is the use of ONE in place of 1 (although I turned k > 0 into k >= ONE, which is the same thing). Since ONE is set to 1, everything still works. This is the insertionSort. If you have any questions up to this point, stop, review and ask in the forums.

Let's now do what we promised. We will place this insertionSort into an outer loop. The outer loop will change the value of the variable ONE, which is fine, as long as, in the final pass ONE = 1. No matter what other horrible things might come of this crazy design, we at least know that in the final pass of the outer loop, when ONE = 1, we will be applying an insertion sort, which will guarantee that our array is sorted. Here is an abstract version of the outer loop:

   for (ONE = someLargeNumber;  ONE > 0;  
         ONE gets smaller each pass, eventually reaching 1)
      [The double nested loop above]

I know you think it silly that I'm naming a variable ONE when it can have values larger than 1, but trust me - many of you will stay with the logic better if I keep reminding you that the double-nested inner loop is exactly insertion sort when ONE = 1, a secure feeling.

Let's come up with some actual scheme for the new outer loop. Shell's own scheme, started with a number ONE = arraySize/2 and cut itself in half each pass, ending with -- as I keep promising: ONE = 1:

   for (ONE = arraySize/2;  ONE > 0;  ONE /= 2)
      [The double nested loop above]

This will work, right? No matter what the arraySize is (as long as it's positive), if we keep cutting it in half, we'll eventually reach 1. So, I can now give you the full code - and yes I am sticking with ONE for the outer loop variable - placing the insertionSort into this outer loop. It gives us the full shellSort, using Shell's outer loop strategy:

for (ONE = arraySize/2;  ONE > 0;  ONE /= 2)
   for(pos = ONE ; pos < arraySize; pos++ )
   {
      tmp = a[pos];
      for(k = pos; k >= ONE && tmp < a[k - ONE]; k -= ONE )
         a[k] = a[k - ONE];
      a[k] = tmp;
   }

In the form of a complete template function it looks like this:

// shellSort #1 -- using shell's outer loop
template <typename Comparable>
void shellSort1( FHvector<Comparable> & a )
{
   int k, pos, arraySize, ONE;
   Comparable tmp;
    
   arraySize = a.size();
   for (ONE = arraySize/2;  ONE > 0;  ONE /= 2)
      for(pos = ONE ; pos < arraySize; pos++ )
      {
         tmp = a[pos];
         for(k = pos; k >= ONE && tmp < a[k - ONE]; k -= ONE )
            a[k] = a[k - ONE];
         a[k] = tmp;
      }
}

Here is the client with some benchmarks:

// Comparison of vector-insertion, shell and STL sort
#include <iostream>
using namespace std;
#include "Foothill_Sort.h"
#include "FHsort.h"
#include "FHvector.h"
#include <time.h>
#include <algorithm>
#include <vector>

// --------------- main ---------------
#define ARRAY_SIZE 30000

int main()
{
   int k;
   int arrayOfInts[ARRAY_SIZE];
   FHvector<int> fhVectorOfInts;
   FHvector<int> fhVectorOfInts2;
   vector<int> stlVectorOfInts;

   // build an array and two vector for comparing sorts
   for (k = 0; k < ARRAY_SIZE; k++)
   {
      arrayOfInts[k] = rand()%ARRAY_SIZE;
      fhVectorOfInts.push_back(arrayOfInts[k]);
      fhVectorOfInts2.push_back(arrayOfInts[k]);
      stlVectorOfInts.push_back(arrayOfInts[k]);
   }

   clock_t startTime, stopTime;
   startTime = clock();  // ------------------ start 

   // sort using insertionSort (in FHsort.h)
   insertionSort(fhVectorOfInts); 
 
   stopTime = clock();  // ---------------------- stop
   cout << "\nFH Insertion Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   startTime = clock();  // ------------------ start 

   // sort using shellSort #11 (in FHsort.h)
   shellSort1(fhVectorOfInts2); 
 
   stopTime = clock();  // ---------------------- stop
   cout << "\nFH shellSort Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   startTime = clock();  // ------------------ start 

   // a pure stl sort - STL's best effort on its own vector types
   sort(stlVectorOfInts.begin(), stlVectorOfInts.end());

   stopTime = clock();  // ---------------------- stop
   cout << "\nSTL sort Elapsed Time: " 
      << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC 
      << " seconds." << endl << endl;

   // confirming that we got the correct results on all three arrays
   for (k = 0; k < ARRAY_SIZE; k+= ARRAY_SIZE/10)
   {
      cout << "insertion #" << k << ": " << fhVectorOfInts[k] << ", ";
      cout << "shell 1 #" << k << ": " << fhVectorOfInts2[k] << ", ";
      cout << "STL sort #" << k << ": " << stlVectorOfInts[k] << endl;
   }
   return 0; 
}

The results are striking, but we have not yet analyzed the algorithm fully to understand why:


FH Insertion Elapsed Time: 0.377 seconds.


FH shellSort Elapsed Time: 0.005 seconds.


STL sort Elapsed Time: 0.002 seconds.

insertion #0: 0, shell 1 #0: 0, STL sort #0: 0
insertion #3000: 1623, shell 1 #3000: 1623, STL sort #3000: 1623
insertion #6000: 3756, shell 1 #6000: 3756, STL sort #6000: 3756
insertion #9000: 7054, shell 1 #9000: 7054, STL sort #9000: 7054
insertion #12000: 10394, shell 1 #12000: 10394, STL sort #12000: 10394
insertion #15000: 13691, shell 1 #15000: 13691, STL sort #15000: 13691
insertion #18000: 17026, shell 1 #18000: 17026, STL sort #18000: 17026
insertion #21000: 20234, shell 1 #21000: 20234, STL sort #21000: 20234
insertion #24000: 23445, shell 1 #24000: 23445, STL sort #24000: 23445
insertion #27000: 26740, shell 1 #27000: 26740, STL sort #27000: 26740
Press any key to continue . . .

Our modest shellSort is about 75× faster than the insertion sort. STL sort is twice as fast as the shellSort, though. I could increase the array size, but let's keep it at 30,000 since we started there. On a Unix system, we can see the times more accurately.

FH Insertion Elapsed Time: 4.10687 seconds.


FH shellSort Elapsed Time: 0.021573 seconds.


STL sort Elapsed Time: 0.001541 seconds.

Here, shellSort beats insertionSort by almost 200×. And we are now only 13× slower than STL sort. Progress.

In the next section we see that Shell's outer loop isn't very good, and we can do much better with a different choice.