Section 5 - Implementing Insertion
5A.5.1 Proper Placement of Rebalancing Code
In both insertion and deletion there is an important idea that you need to understand if you are to really have a solid feeling about the success of your rebalancing. As with all recursion, we do have to have the faith that things are perfect at the deeper levels of recursion, and we only have to explicitly care for the current level. In other words, it is okay for us to assume all is well at the lower levels, but we have to manually deal with the current level. That's really old recursion wisdom.
New for us today, though, is the idea that when we insert or remove anything from a tree, we have to fix things from the bottom up, i.e., repair and rebalance from the point of insertion/deletion, on the path upward to the root. This includes adjusting the height member of each node on the path back to the root. That part might seem obvious. Anything we touch could affect parents, grandparents and higher ancestors. But to do this, we must work from the bottom-up.
Think about it. Nothing we do, actively at any node, can destroy the AVL balance or height information below that node, or on any branch that does not contain that node. So we fix things exactly at that node after the removal or insertion, because the removal or insertion might have disturbed the situation at that one node. Next, we consider that we may have messed things up above the node, too, so we take a step up to our parent, and adjust or fix anything needed on that level. We do it again and again until we hit the root. So we take an action, then back-pedal to the top of the tree, fixing the AVL condition and/or height information as we go.
Conceptually this should make sense. But to implement it cleanly, we need to let the rebalancing work in concert with the recursion which is central to all tree insertion and deletion algorithms. In short, you should always recurse first, then rebalance immediately after. This will assure that your tree is adjusted from the bottom-up.
Coding Details:
- Use the same insertion/removal logic as simple BST; copy and paste the insert and remove algorithm as a starting point.
- Every time you recurse, assume that the lower levels have taken care of themselves. Then -- and this is important -- make your adjustments manually to the current level after (not before) the recursive call. Usually, this is the next statement after the recursive call.
- If you are not recursing in a branch of your function definition, you will definitely not need to rebalance at the current stage, so rebalancing code only need follow recursive calls.
- You may need to adjust height on certain non-recursive logic branches. This depends on the algorithm.
These steps -- especially following recursion with rebalancing -- assure that you will be restoring balance to your entire tree in each operation.
Here is an abstraction of this idea in an imaginary operation remsert() which could be remove() or insert().
bool FHavlTree<Comparable>::remsert( const Comparable & x, FHs_treeNode<Comparable> * & root ) { ... // recurse if ( !remsert(x, root->lftChild) ) return false; // follow recursion with rebalancing (rotations) < rebalancing code - do a rotation centered at root > ... }
And to see this in a larger context, here is a full outline of a remsert() function definition where the balancing and height adjustments are done in the appropriate places:
template <class Comparable> bool FHavlTree<Comparable>::remsert( const Comparable & x, FHs_treeNode<Comparable> * & root ) { if (root == NULL) < non-recursive moves > if (x < root->data) { if ( !remsert(x, root->lftChild) ) return false; < rebalancing code - do a rotation centered at root > } else if (root->data < x) { if ( !remsert(x, root->rtChild) ) return false; < rebalancing code - do a rotation centered at root > } // found the node else if ( < any other conditions that might require recursion > ) { remsert( ... ); < rebalancing code - do a rotation centered at root > } else < non-recursive moves> // successfully remserted and rebalanced at this and lower levels; now adjust height root->setHeight( max(heightOf(root->lftChild), heightOf(root->rtChild)) + 1); return true; }
5A.5.2 Insertion
With this groundwork, we are ready to look at the insertion algorithm. Please pull out the basic FHsearch_tree::insert() method for comparison, since this is based on it. With the other half of your brain, compare the code to the mythological remsert() above, to verify that the rebalancing code does, indeed, appear where it should.
// private utilities for member methods template <class Comparable> bool FHavlTree<Comparable>::insert( const Comparable & x, FHs_treeNode<Comparable> * & root ) { if( root == NULL ) root = new AvlNode<Comparable>(x, NULL, NULL); // found a place to hang new node else if( x < root->data ) { if ( !insert(x, root->lftChild) ) return false; if(heightOf(root->lftChild) - heightOf(root->rtChild) == 2) if( x < root->lftChild->data ) rotateWithLeftChild(root); else doubleWithLeftChild(root); } else if(root->data < x) { if ( !insert(x, root->rtChild) ) return false; if(heightOf(root->rtChild) - heightOf(root->lftChild) == 2) if(root->rtChild->data < x) rotateWithRightChild(root); else doubleWithRightChild(root); } else return false; // successfully inserted at this or lower level; adjust height root->setHeight( max(heightOf(root->lftChild), heightOf(root->rtChild)) + 1); return true; }
Perfunctory Observations:
- This is the private, recursive, version of insert(). As with all our prior insert functions, there is a public version with just a couple lines that preps, then calls, this. There is no mystery there, and that can wait until you inspect the full include file.
- An apparently static method heightOf() takes a node pointer and returns a height value. This seems wrong -- why not just access root->rtChild->height directly? If root->rtChild is null, we would get a run-time error, yet in that case we still want a valid number (-1, remember?). Furthermore, testing for non-null directly in the code here is bad form -- it is grunt work and should be hidden. You will see the definition soon, but it is as advertized - something that returns the height of the node passed in, or a -1 if the pointer is null.
- In the first line, we test for null, and if true, we have found a place for the node. No recursion, therefore no rebalancing, is needed at this level - at best we might have to rebalance the parent, but that's the next level up. [Actually we won't have to rebalance the parent, either. Try to think of a situation in which hanging a new node on the null half of a parent turns that parent into an AVL-imbalanced node. Doesn't happen.]
- As noted earlier, we are using FHs_treeNode*s not AvlNode*s, in many places. Not important here - they are the same thing for our purposes - these pointers do point to AvlNodes.
Deeper Observations:
- If we do recurse, we have to rebalance. Now is where we pull out our hard-fought rotation artillery. We are adding a node to a tree. The tree was balanced before, so the only way it could become imbalanced at this node is if the insertion makes one of the child sub-trees a little too tall compared with its sibling. Therefore, the place where the node was inserted is what we called the X sub-tree in our earlier discussions - the one that is too tall.
- In order to determine what kind of rotation to do, we follow the path of the newly inserted node, whose key data is Comparable &x. If x dropped right-right or left-left, we do a single rotation. If x dropped right-left or left-right, we do a double rotation. Hah!
- The details of the comparisons require that you peer deeply into the if statements and the subsequent rotation method calls. Everything that I have said in the earlier pages, and this section, is exactly reflected by these statements. Pick just this one section of code to study because the other is just its mirror image. Do not proceed without 100% comprehension of this, and how it relates to the meaning of the single and double rotations:
if ( !insert(x, root->lftChild) ) return false; if(heightOf(root->lftChild) - heightOf(root->rtChild) == 2) if( x < root->lftChild->data ) rotateWithLeftChild(root); else doubleWithLeftChild(root);
- The rebalancing code tinkered with the tree at the current level to restore AVL balance, but it didn't necessarily fix the height member of the root (center of rotation) node. We do that at the very end of the function.