Section 2 - Heap Sort
7B.2.1 Heap Sort
We are this close to a sorting algorithm that beats the pants off of the O(N2) clumsy bubble sort. Consider this: we have a vector. We can put it into a binary heap in O(N) time using the vector-receiving constructor. From there, we can remove() elements, which will cause, at each remove(), the smallest item to be popped off the top of the binary heap. Each of these operations takes O(logN) time (because it is a balanced binary tree). We do the remove() exactly N times, which gives our "heap sort" O(N) + O(NlogN) = O(NlogN) performance.
We can do this in a test client and compare it to the bubble sort of 200,000 ints just for fun. That's going to be our first move.
// Comparison of bubble and heap sort algorithms #include <iostream> using namespace std; #include "Foothill_Sort.h" #include "FHbinHeap.h" #include "FHvector.h" // for timing our algorithms #include <time.h> // --------------- main --------------- #define ARRAY_SIZE 100000 int main() { int k; int arrayOfInts[ARRAY_SIZE]; FHvector<int> vectorOfInts; // build both an array and a vector for comparing sorts for (k = 0; k < ARRAY_SIZE; k++) { arrayOfInts[k] = rand()%ARRAY_SIZE; vectorOfInts.push_back(arrayOfInts[k]); } clock_t startTime, stopTime; startTime = clock(); // ------------------ start // sort using a home made bubble sort (in Foothill_Sort.h) arraySort(arrayOfInts, ARRAY_SIZE); stopTime = clock(); // ---------------------- stop cout << "\nAlgorithm Elapsed Time: " << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC << " seconds." << endl << endl; startTime = clock(); // ------------------ start FHbinHeap<int> heap(vectorOfInts); try { for (k = 0; k < ARRAY_SIZE; k++) vectorOfInts[k] = heap.remove(); } catch (...) { } stopTime = clock(); // ---------------------- stop cout << "\nAlgorithm Elapsed Time: " << (double)(stopTime - startTime)/(double)CLOCKS_PER_SEC << " seconds." << endl << endl; for (k = 0; k < ARRAY_SIZE; k+= ARRAY_SIZE/20) { cout << "bubble #" << k << ": " << arrayOfInts[k] << ", "; cout << "heap #" << k << ": " << vectorOfInts[k] << endl; } return 0; }
Here is the output. I have sampled both output arrays to make sure they were in agreement.
Algorithm Elapsed Time: 34.255 seconds. Algorithm Elapsed Time: 0.047 seconds. [snip] Press any key to continue . . .
The heap sort is about 700 times faster than the bubble sort on this size data set. It beats it by more the larger the array gets. The above is without optimization. When the compiler is optimized for speed, I get:
Algorithm Elapsed Time: 13.052 seconds. Algorithm Elapsed Time: 0.008 seconds. [snip] Press any key to continue . . .
The heap sort is over 1600 times faster than the bubble sort with optimization on.
7B.2.2 Non-Heap Heap Sort
We used a FHbinHeap object in our client to realize the heap sort. Obviously, we would like to offer a client a cleaner approach. What we want is a HeapSort( array[] ) function that doesn't bother the client with any details. This is easy enough to accomplish by wrapping the code above into its own function. The question is, "are we squeezing out the last ounce of efficiency by letting the binary heap methods do the work?". The answer is, "not quite, but it's more than adequate for most applications". After all, we just improved the sort upper bound times by a factor of almost 500. Whatever we do to fine-tune the algorithm will hardly be noticeable -- except for one thing.
The fact that binary heaps use simple vectors for their underlying representation suggests that we might not have to copy the vector we receive from the client into the heap's internal vector. Perhaps we can utilize the client's vector to house our percolateDown() operations. This is exactly what we do.
Rather than building a separate data structure for a binary heap, we will export our technology and install it into a special purpose function whose only job is sorting an array. This is not a difficult transformation, and can be found in many texts, including your own. When we discuss sort algorithms in the next week or two, we will do it. However, it is a nice exercise for any programmer. Here are the key factors in making this translation:
- There is no FHbinHeap object. We use the client vector (or FHvector or array) to hold the data. Our HeapSort() function (and any of its support functions) will operate directly on this array.
- The functions like orderHeap() which will be converted for this process have to respect the fact that the user vector stores real data beginning at 0, not 1 as was done in the FHbinHeap class.
- To avoid copying to and from the client array, we use the vacated last location in each remove() to store the remove()-ed data. By doing this each remove(), we will be filling the user's vector from last position to first position with min values appearing on the right, or top, of the vector and max values appearing on the left, or bottom of the vector . At the end, the smallest value will be in location vector.size() - 1 and the largest will be in location 0.
- If, as is often the case, we want smallest-to-largest sorting, the transformed algorithms must be changed so that we are emulating a bin max-heap, rather than the bin min-heap we have been discussing. This is a matter of going through each function, line-by-line and, where appropriate, replacing statements like a < b with b < a.
These last two pages gave us a small taste of what we can accomplish regarding the sorting of data, something that is ubiquitous in computer science and, indeed, in business and technology. You'll be pleased to learn that sort algorithms are not very hard to write, and each has its own character and target applications. Furthermore, many of them stand on their own two feet, without the help of an underlying template class like FHbinHeap, which makes their study even more pleasant (if you like simplicity).
I hope you can wait until next time to start learning about these other sorting algorithms, because that's all I've got for you this week. However, you'll have to read a little ahead to complete your homework assignment - it will involve your timing certain sorts. Enjoy the experimentation.