Section 1 - Introducing General Trees

9A.1.1 Overview of Week 9

Trees

This week, we will get a taste of a new ADT: the tree. Trees are widely used in computer science and they come in many flavors from general trees to red-black, top-down splay, and AVL, to name a few. But we save these exotic sounding varieties for the next course, because general trees are a handful all by themselves. And it pays to master the many tree maneuvers on general trees before studying the more advanced ones.

9A.1.2 The Language of Trees

A tree is a data structure containing nodes that are organized by a parent-child-sibling relationship. There is always a root node that is at the highest level of the tree (which makes it more of an upside-down tree), and below the root are one or more direct children, each pictorially attached to the root with a line. Each of these children can be the parent of further children and so on until you come to some nodes that have no children. These are referred to as leafs or leaf nodes. Here is a picture from a textbook by Nyhoff that summarizes the general structure of a tree.

Every node is the root of a subtree. For example, node 2 is the root of the subtree consisting of it and everything below it (which is the left side of the main tree in the above picture). Node 3 is the root of a small subtree consisting of it and its three children, nodes 5, 6 and 7.

Depth refers to how many levels below the root a node is. A root node, like node 1 is said to be at depth zero and nodes 8 and 9 would be at depth three.

The height of any node would be how many levels, "worst case", there are below that node (going towards the deepest node). So the height of node 2 is two, the height of node 1 is three, and the height of node 6 is zero.

If we are discussing the depth of a node in a subtree, we consider its position only in that subtree, and not the larger tree. So, in the subtree whose root is node 4, node 4 has depth zero and node 9 has depth one. Therefore, we see that depth is not an absolute, but can change depending on which subtree we are talking about.

Height, on the other hand, is independent of the subtree we are considering, because height is measured downward, not upward.

9A.1.3 Example Applications

Examples of tree data structures are:

The directory/folder structure of any operating system like unix, Mac or Windows.
A taxonomic organization of plants or animals by their official names (kingdom, phylum, class ... genus, species).
Temporary structures built by your compiler that represent numerical expressions like "(2*x + exp(s - .5) * ( (s + 3)/(s - 2) + sin(x))".
An object in a computer animation, such as a zebra or a person. Each movable assembly, like an arm, is the root of a subtree whose children all move together with the root. For example, an arm would have an elbow and forearm as its children. The forearm would have the hand as its child. The hand would have the five fingers as its five children. Each finger would have a single child for the bone section below it. When computing motion of the object in the tree, we would move an entire subtree by moving its root, all the other items going along for the ride. If, in addition to the arm movement, the thumb also moves, we have to combine the two movements to determine the resultant position of the thumb. Computer animation and real-time simulation programmers deal with this in great detail.
Linguistic phrase structure of a sentence can be organized like a tree, with each phrase containing sub-phrases. This would be used in determining the meaning of the sentence in a translation program. Programmers who work with artificial intelligence and language use these sorts of trees.
A parts program to locate, order and track a part of a car would be a candidate for a tree data structure.
Most data structures that require fast access and sorting are built using trees. For instance, we'll see that many of our subsequent data structures such as maps, sets, and priority queues all use trees as an underlying foundation.

Of the above list, items 3, 5 and 7 differ from the others. They show a tree as a means, not an end. As with all data structures, we sometimes use a tree because it is helpful to solve a problem (search, sort, etc.), not because we really have an intrinsic set of data in that particular form. Bullet items 1, 2, 4 and 6 have data that must be stored somewhat permanently, and the tree really is the structure of the data. Items 3, 5 and 7, however will create the structures just to make something else work, not because the data really is in a tree form, intrinsically.

9A.1.4 General Tree Design Strategy

In the study of tree data structures, one inevitably begins with the most general and ill-behaved type of tree. It is so ill-behaved, it doesn't even have a name. We just call it a general tree, or tree, and reserve all the cool words (like binary search tree, AVL tree, etc.) for certain sub-types that have structures which make specialized operations easier to implement. But the general tree, as wild and challenging as it is, is worth studying because sometimes that's what we're handed, and we have to have an idea of how to face it. And it does, after all, represent many of the types of applications listed above.

This most general tree can have nodes with variable numbers of children. Here is such a tree, in its logical from:

If I were to ask you to implement a tree data structure that supported some of the typical operations like addChild(), removeNode(), find() and printAll(), I would probably get some interesting and creative answers. Some such solutions might attempt to define a TreeNode with a dynamic array of pointers to children. If a TreeNode had three children, then we might allocate a vector of three child TreeNodes. Of course when we add or delete children, this would require resizing the vector. There is, however, a smarter and more accepted method for lashing up these beasts which is quite flexible and more efficient than the vector approach. It entails giving each TreeNode a structure that is a fixed, small size: each node consists of a data member plus two pointers. The pointers are firstChild and sib (meaning next sibling) This would cause the above logical tree to be stored, in reality, in the following manner:

To understand how this relates to the TreeNode type I described, consider a few of the nodes above:

TreeNode A: firstChild points to B, sib is NULL
TreeNode C: firstChild is NULL, sib points to D
TreeNode E: firstChild points to I, sib points to F
TreeNode H and Q (among others): firstChild is NULL, sib is NULL