AVL Binary Search Tree
Wikipedia said:
In computer science, an AVL tree (Adelson-Velskii and Landis' tree, named after the inventors) is a self-balancing binary search tree, and it was the first such data structure to be invented.[1] In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.
The AVL tree is named after its two Soviet inventors, G. M. Adelson-Velskii and E. M. Landis, who published it in their 1962 paper "An algorithm for the organization of information".[2]
AVL trees are often compared with red-black trees because both support the same set of operations and take O(log n) time for the basic operations. For lookup-intensive applications, AVL trees are faster than red-black trees because they are more rigidly balanced.[3] Similar to red-black trees, AVL trees are height-balanced, but in general not weight-balanced nor μ-balanced for any \scriptstyle \mu\leq\tfrac12;[4] that is, sibling nodes can have hugely differing numbers of descendants.
More information:
AVL tree - Wikipedia, the free encyclopedia
The objective is to keep the height of these trees Θ(log
), which requires the heights of the left and right subnodes to differ by at most +/- 1. (|left
h - right
h| = 1). This means we want to try to keep the height of the right side at most the height of the left side + 1, for every node within the tree. The way this is done to keep a relatively balanced tree (an AVL tree), is through a process known as a binary tree rotation. Worst case scenario is when the height of the tree == the number of nodes within the tree - 1 (because the last leaf node will have a height of 0, not 1. Think of an array where the first index starts at 0, instead of 1).
A recurrence formula for the height of a node in question is:
Nh = Nh-1 + Nh-2 + 1
1 is added onto the other two heights because we also need to include the root node as a level which is used to calculate the height.
Let's look at an example of an AVL tree, then go through what the height of the tree is and each of the nodes are, then I'll show an example of a rotation...
AVL Tree:
Now with an indication of what the height of the tree is (height from the root node), and all other subnodes:
And you can see by the images above how this is also a great structure for sorting as well. The significance of an AVL tree is not defined by it's levels but rather left and right halves. Notice how the numbers that branch off of each parent node are all numbers greater than or less than the main parent node. Node elements on the left are less than the parent node element, and nodes below the parent on the right are greater than the parent node element, reversely. There are ways to traverse this implementation of a binary tree to grab all of these numbers in order, known as the in-order traversal.
Now for an example of a tree rotation... We do this in order to "AVL-itize" a binary tree to maintain an AVL format as much as possible. So, say for instance, using the above AVL tree, we were to add an element of the value 51. Now we've got a difference of 2 nodes, and not +/- 1 like mentioned earlier:
Here the objective is to move 51 to the 3rd level essentially to make this binary tree more balanced. The AVL tree looks like this after the final rotation:
This rotation actually happens in constant time (more specifically O(1) time). The procecure above actually a combination of rotations (Because not all the time a single minor rotation of 3 nodes will do the trick). The rotations are minor rotations of 3 nodes however, formatted in a small triangular form. If there is no node to make up the 3rd node then we assume we are rotating an invisible node because the rotation is still the same with or without that 3rd node. (FYI: If a parent node has no child nodes or has one child node, the other missing node(s) are defined to typically have a height (h) of -1, because this makes our formula for calculating the height of nodes easier. This is because the height of node N is the max height of subnode N
a, and subnode N
b + 1. If there are no child nodes at a particular place in the tree, then both child nodes have a height of -1, and the max of -1 and -1 is -1, and -1 + 1 == 0. So N has a height of 0.)
I'm currently writing a wrapper implementation of AVL trees in C++, I've got pretty much all of it down other than the predecessor and successor, as well as maybe some other methods to count total nodes and such, but, I'm still doing bugtesting at this point.
Code for output above:
Code:
[NO-PARSE]node *p = 0;
avl_tree avlbst;
avlbst.insert(65, p);
avlbst.insert(41, p);
avlbst.insert(50, p);
avlbst.insert(20, p);
avlbst.insert(26, p);
avlbst.insert(29, p);
avlbst.insert(11, p);
std::cout << "In-order traversal:\n";
avlbst.inorder(p); std::cout << '\n' << std::endl;
std::cout << "Tree height (root node height): " << avlbst.node_height(p) << std::endl;
std::cout << "Max element: " << avlbst.minimum(p)->element << std::endl;
std::cout << "Min element: " << avlbst.maximum(p)->element << std::endl;[/NO-PARSE]
You'll notice that sorting is as easy as insertion and doing an in-order traversal of the binary (AVL) tree. Insertion takes a max of Θ(n*log
) time (because each insertion takes h time, but h is guaranteed to be Θ(n*log
)), and the in-order traversal itself is in linear Θ
time.