Section 3 - Implementing Separate Chaining
6A.3.1 Template Prototype
The class will be called FHhashSC (SC for Separate Chaining). It will be based on both vectors and lists. You could, of course, use STL vectors and lists, but I'm going to build on FHvector and FHlist. First, the prototype.
// File FHhashSC.h // Template definitions for FHhashSC. // Separate Chaining Hash Table #ifndef FHHASHSC_H #define FHHASHSC_H #include "FHvector.h" #include "FHlist.h" #include <cmath> using namespace std; // ---------------------- FHhashSC Prototype -------------------------- template <class Object> class FHhashSC { protected: static const int INIT_TABLE_SIZE = 97; static const float INIT_MAX_LAMBDA; FHvector< FHlist<Object> > mLists; int mSize; int mTableSize; float mMaxLambda; public: FHhashSC(int tableSize = INIT_TABLE_SIZE); bool contains(const Object & x) const; void makeEmpty(); bool insert(const Object & x); bool remove(const Object & x); static long nextPrime(long n); int size() const { return mSize; } bool setMaxLambda( float lm ); protected: void rehash(); int myHash(const Object & x) const; };
6A.3.2 Data, Constants and Constructors
The two static class constants should make sense. The first is the initial table size, which as you see is a prime number. Prime numbers are a good idea, but not absolutely essential for separate chaining. They are essential for quadratic probing, and we'll be doing a little of that fun-sounding activity later in the week.
INIT_MAX_LAMBDA is the upper limit allowed on our table's load factor. We will set it to be 1.5, initially.
template <class Object> const float FHhashSC<Object>::INIT_MAX_LAMBDA = 1.5;
There is a mutator, setMaxLambda(), which goes with it.
The private (protected) data consists of three numbers, mTableSize, mSize, and mMaxLambda, and the star of the show, mLists, an array of lists. (Did this remind you of anything?) So, we are laying our cards on the table (no pun, intended - really):
The table will take the form of a vector (FHvector, to be exact) and the buckets will be lists (FHlists). This decision will dictate the form and function of all that follows.
This template declaration, which defines the member mLists, is nested, but there is nothing new about it. It just happens to be a template of a template. Don't proceed until you understand it: FHvector< FHlist<Object> > mLists;
The constructor will set the vector to size INIT_TABLE_SIZE, and each list will be empty. The constructor should now make sense.
template <class Object> FHhashSC<Object>::FHhashSC(int tableSize) : mSize(0) { if (tableSize < INIT_TABLE_SIZE) mTableSize = INIT_TABLE_SIZE; else mTableSize = nextPrime(tableSize); mLists.resize(mTableSize); mMaxLambda = INIT_MAX_LAMBDA; }
In case you forgot, resize() is vector's method to set the size of the vector.
We allow the client to suggest a table size. We don't allow it to be smaller than INIT_TABLE_SIZE and if it is larger, we don't take it as is, but supersize it to the next prime number. NextPrime() is a method -- one of many that you can imagine -- which locates the next prime number > the suggested number. I'll give you my version, which uses the fewest multiplications and is about as fast as it can be without pre-computing a table of primes. It might be interesting for some of you to figure out how and why it works, and even to find mistakes or improvements -- but that's not germane to our discussion. I won't comment on it further, here.
template <class Object> long FHhashSC<Object>::nextPrime(long n) { long k, candidate, loopLim; // loop doesn't work for 2 or 3 if (n <= 2 ) return 2; else if (n == 3) return 3; for (candidate = (n%2 == 0)? n+1 : n ; true ; candidate += 2) { // all primes > 3 are of the form 6k +/- 1 loopLim = (long)( (sqrt((float)candidate) + 1)/6 ); // we know it is odd. check for divisibility by 3 if (candidate%3 == 0) continue; // now we can check for divisibility by 6k +/- 1 up to sqrt for (k = 1; k <= loopLim; k++) { if (candidate % (6*k - 1) == 0) break; if (candidate % (6*k + 1) == 0) break; } if (k > loopLim) return candidate; } }
6A.3.3 Rehashing
Before doing the public methods, let's dispose of the trickiest of all - the private rehash() method. Roughly, it doubles the old table size and constructs a new vector that is twice as large (or more, since we search for the first prime larger than twice the old size). Then we run through the old table, grabbing old elements and, as we find them, insert each into the new table, individually.
Your first thought might be to simply double the array size and go home -- let the old values stay where they were in the first half of the new array. However, that won't work. The old positions were based on the old hash function (that is, the combined effect of the client's hash() and the class's myHash() considered as a single computation). The moment we change the table size, we change this hash function. We originally put the items into the smaller table using the old function, and the new function will not be able to find them.
Thus, rehashing is expensive. But it is not done often. The longer we wait to rehash, the less impact it has on average (but the longer that operation takes).
template <class Object> void FHhashSC<Object>::rehash() { FHvector< FHlist<Object> > oldLists = mLists; FHlist<Object>::iterator iter; int k, oldTableSize = mTableSize; mTableSize = nextPrime(2*oldTableSize); mLists.resize( mTableSize ); // only the first old_size lists need be cleared for(k = 0; k < oldTableSize; k++ ) mLists[k].clear(); mSize = 0; for(k = 0; k < oldTableSize; k++) for(iter = oldLists[k].begin(); iter != oldLists[k].end(); iter++) insert(*iter); }
Because oldLists is a local variable, it will be deleted completely when it goes out of scope, and the destructor will manage de-allocation. We do have to be sure that when we set the oldLists = mLists and then clear out each of the buckets, mLists[k].clear(), we have not incidentally deleted our oldLists' data which we will still need to copy into the new hash table. This works only if the vector assignment operator of vecA = vecB does its deep copies directly. If we look (or think about) the implementation of vector's operator=(), we will realize that it depends on the correct assignment of its elements, which in this case are lists. So we have to know, with certainty, that the lists copy their deep data over properly and then the vector does the same. Of course they do in both STL vectors and lists and FHvectors and FHlists, so we are fine.
6A.3.4 insert(), contains() and remove()
In all of these operations, the first move is to apply the hash function and use it to extract the exact list -- the bucket -- in which we have to search. This is done in the second line (it would be the first, but my style is to declare the iterator first, rather than further down when it is used). Then, we iterate through the list looking for the object. Here, we make use of the required operator == in the underlying class. It must be defined, and it must be defined sensibly. You want equality to mean that we are looking at two equivalent objects. This is not to say they are the same object -- they won't be. The client has to decide what makes two objects equivalent. It might be that all members are equal, but maybe only a few or one special member (the key, anyone?) being equal will constitute equivalency from the client's point of view.
insert() is our first source. If we don't find the object already in the hash table, we add it to the correct list using list::push_back(). This may have put the load factor over the limit, so we test that and rehash() if necessary.
template <class Object> bool FHhashSC<Object>::insert(const Object & x) { FHlist<Object>::iterator iter; FHlist<Object> &theList = mLists[myHash(x)]; for (iter = theList.begin(); iter != theList.end(); iter++) if (*iter == x) return false; // not found so we insert theList.push_back(x); // check the load factor if( ++mSize > mMaxLambda * mTableSize ) rehash(); return true; }
If you are using the STL list template, rather than the FHlist, you can replace my loop with the find() algorithm. This is shown in the text and many other implementations. However, find() will only work on iterators that are part of the STL library. It won't operate on hand-rolled iterators, which is why I don't attempt it. Of course, you can try to build your own find() algorithm that applies to FHlist iterators, but the simplicity of the above loop demonstrates that it is hardly worth the trouble.
By the way, notice that insert() calls rehash(). But remember rehash() called insert(). Are we safe from an infinite recursion? Better be sure.
contains() is even easier than insert(). We do everything we did in insert(), and less.
template <class Object> bool FHhashSC<Object>::contains(const Object & x) const { const FHlist<Object> & theList = mLists[myHash(x)]; FHlist<Object>::const_iterator iter; for (iter = theList.begin(); iter != theList.end(); iter++) if (*iter == x) return true; // not found return false; }
remove() requires that we use the list iterator -- stopped at the correct object in the correct bucket -- to invoke erase(). You may want to review list::erase() to remind yourself how it is used.
template <class Object> bool FHhashSC<Object>::remove(const Object & x) { FHlist<Object> &theList = mLists[myHash(x)]; FHlist<Object>::iterator iter; for (iter = theList.begin(); iter != theList.end(); iter++) if (*iter == x) { theList.erase(iter); mSize--; return true; } // not found return false; }
Any remaining method definitions can be found in the FHhashSC.h file.
6A.3.5 Test Client
Finally, here is the test main():
// CS_1C Hash Table Client // note that order of includes and prototypes is required by some compilers #include <iostream>
#include <string> using namespace std; class Employee; int Hash( const Employee & item ); int Hash(const string & key); int Hash( int key ); #include "FHhashSC.h" // Example of an Employee class class Employee { public: static const int MAX_LEN = 50; Employee( string name = "undefined", int ss = 0); const string & getName() const { return name; } int getSS() const { return ss; } bool setName( string name ); bool setSS( int ss ); bool operator==( const Employee & rhs ) const { return getName() == rhs.getName(); } bool operator!=( const Employee & rhs ) const { return !( *this == rhs ); } private: string name; int ss; }; Employee::Employee( string name, int ss) { if (!setName(name)) name = "(undefined)"; if (!setSS(ss)) ss = 0; } bool Employee::setName(string name) { if (name.length() > MAX_LEN) return false; this->name = name; return true; } bool Employee::setSS(int ss) { if (ss < 0 || ss > 999999999 ) return false; this->ss = ss; return true; } int main() { // first set of tests -------------------- FHhashSC<Employee> hashTable; Employee e1("Jane Austin", 1), e2("Rene Descartes", 2), e3("John Locke", 3); if ( hashTable.insert(e1) ) cout << "inserted " << e1.getName() << endl; if ( hashTable.insert(e1) ) cout << "inserted " << e1.getName() << endl; if ( hashTable.insert(e2) ) cout << "inserted " << e2.getName() << endl; if ( hashTable.insert(e2) ) cout << "inserted " << e2.getName() << endl; if ( hashTable.insert(e3) ) cout << "inserted " << e3.getName() << endl; if ( hashTable.insert(e3) ) cout << "inserted " << e3.getName() << endl; cout << hashTable.size() << endl; if ( hashTable.contains(e3) ) cout << "contains " << e3.getName() << endl; if ( hashTable.remove(e1) ) cout << "removed " << e1.getName() << endl; if ( hashTable.remove(e1) ) cout << "removed " << e1.getName() << endl; if ( hashTable.remove(e2) ) cout << "removed " << e2.getName() << endl; if ( hashTable.remove(e2) ) cout << "removed " << e2.getName() << endl; if ( hashTable.remove(e3) ) cout << "removed " << e3.getName() << endl; if ( hashTable.remove(e3) ) cout << "removed " << e3.getName() << endl; cout << hashTable.size() << endl; if ( hashTable.contains(e3) ) cout << "contains " << e3.getName() << endl; // second set of tests -------------------- FHhashSC<string> hashTable2; string substrate = "asdlkfj asdoiBIUYVuwer slkdjLJfwoe89)B)(798rjMG0293lkJLJ42lk3j)(*"; string strArray[500]; int k, limit; substrate = substrate + substrate; for (k = 0; k < substrate.length() - 6; k++) strArray[k] = substrate.substr(k, 5); limit = k; hashTable2.setMaxLambda(.5); for (k = 0; k < limit; k++) if ( hashTable2.insert(strArray[k]) ) cout << "inserted #" << k << " " << strArray[k] << endl; cout << "\n#strings generated: " << limit << "\n#elements in ht: " << hashTable2.size() << endl; return 0; } int Hash( const Employee & item ) { return Hash( item.getSS() ); } // a hash routine for strings that admits negative resturns int Hash( const string & key ) { unsigned int k, retVal = 0; for(k = 0; k < key.length( ); k++ ) retVal = 37 * retVal + key[k]; return retVal; } int Hash( int key ) { return key; }