14.2 How to Augment a Data Structure

The process of augmenting a basic data structure to support additional functionality occurs quite frequently in algorithm design. It will be used again in the next section to design a data structure that supports operations on intervals. In this section, we shall examine the steps involved in such augmentation. We shall also prove a theorem that allows us to augment red-black trees easily in many cases.

Augmenting a data structure can be broken into four steps:

choosing an underlying data structure,
determining additional information to be maintained in the underlying data structure,
verifying that the additional information can be maintained for the basic modifying operations on the underlying data structure, and
developing new operations.

As with any prescriptive design method, you should not blindly follow the steps in the order given. Most design work contains an element of trial and error, and progress on all steps usually proceeds in parallel. There is no point, for example, in determining additional information and developing new operations (steps 2 and 4) if we will not be able to maintain the additional information efficiently. Nevertheless, this four-step method provides a good focus for your efforts in augmenting a data structure, and it is also a good way to organize the documentation of an augmented data structure.

We followed these steps in Section 14.1 to design our order-statistic trees. For step 1, we chose red-black trees as the underlying data structure. A clue to the suitability of red-black trees comes from their efficient support of other dynamic-set operations on a total order, such as MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR.

For step 2, we provided the size field, in which each node x stores the size of the subtree rooted at x. Generally, the additional information makes operations more efficient. For example, we could have implemented OS-SELECT and OS-RANK using just the keys stored in the tree, but they would not have run in O(lg n) time. Sometimes, the additional information is pointer information rather than data, as in Exercise 14.2-1.

For step 3, we ensured that insertion and deletion could maintain the size fields while still running in O(lg n) time. Ideally, only a few elements of the data structure should need to be updated in order to maintain the additional information. For example, if we simply stored in each node its rank in the tree, the OS-SELECT and OS-RANK procedures would run quickly, but inserting a new minimum element would cause a change to this information in every node of the tree. When we store subtree sizes instead, inserting a new element causes information to change in only O(lg n) nodes.

For step 4, we developed the operations OS-SELECT and OS-RANK. After all, the need for new operations is why we bother to augment a data structure in the first place. Occasionally, rather than developing new operations, we use the additional information to expedite existing ones, as in Exercise 14.2-1.

Augmenting Red-Black Trees

When red-black trees underlie an augmented data structure, we can prove that certain kinds of additional information can always be efficiently maintained by insertion and deletion, thereby making step 3 very easy. The proof of the following theorem is similar to the argument from Section 14.1 that the size field can be maintained for order-statistic trees.

Theorem 14.1: (Augmenting a Red-Black Tree)

Let f be a field that augments a red-black tree T of n nodes, and suppose that the contents of f for a node x can be computed using only the information in nodes x, left[x], and right[x], including f[left[x]] and f[right[x]]. Then, we can maintain the values of f in all nodes of T during insertion and deletion without asymptotically affecting the O(lg n) performance of these operations.

Proof The main idea of the proof is that a change to an f field in a node x propagates only to ancestors of x in the tree. That is, changing f[x] may require f[p[x]] to be updated, but nothing else; updating f[p[x]] may require f[p[p[x]]] to be updated, but nothing else; and so on up the tree. When f[root[T]] is updated, no other node depends on the new value, so the process terminates. Since the height of a red-black tree is O(lg n), changing an f field in a node costs O(lg n) time in updating nodes dependent on the change.

Insertion of a node x into T consists of two phases. (See Section 13.3.) During the first phase, x is inserted as a child of an existing node p[x]. The value for f[x] can be computed in O(1) time since, by supposition, it depends only on information in the other fields of x itself and the information in x's children, but x's children are both the sentinel nil[T]. Once f[x] is computed, the change propagates up the tree. Thus, the total time for the first phase of insertion is O(lg n). During the second phase, the only structural changes to the tree come from rotations. Since only two nodes change in a rotation, the total time for updating the f fields is O(lg n) per rotation. Since the number of rotations during insertion is at most two, the total time for insertion is O(lg n).

Like insertion, deletion has two phases. (See Section 13.4.) In the first phase, changes to the tree occur if the deleted node is replaced by its successor, and when either the deleted node or its successor is spliced out. Propagating the updates to f caused by these changes costs at most O(lg n) since the changes modify the tree locally. Fixing up the red-black tree during the second phase requires at most three rotations, and each rotation requires at most O(lg n) time to propagate the updates to f . Thus, like insertion, the total time for deletion is O(lg n).

In many cases, such as maintenance of the size fields in order-statistic trees, the cost of updating after a rotation is O(1), rather than the O(lg n) derived in the proof of Theorem 14.1. Exercise 14.2-4 gives an example.

Exercises 14.2-1

Show how the dynamic-set queries MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR can each be supported in O(1) worst-case time on an augmented order-statistic tree. The asymptotic performance of other operations on order-statistic trees should not be affected. (Hint: Add pointers to nodes.)

Exercises 14.2-2

Can the black-heights of nodes in a red-black tree be maintained as fields in the nodes of the tree without affecting the asymptotic performance of any of the red-black tree operations? Show how, or argue why not.

Exercises 14.2-3

Can the depths of nodes in a red-black tree be efficiently maintained as fields in the nodes of the tree? Show how, or argue why not.

Exercises 14.2-4: ⋆

Let ⊗ be an associative binary operator, and let a be a field maintained in each node of a red-black tree. Suppose that we want to include in each node x an additional field f such that f[x] = a[x₁] ⊗ a[x₂] ⊗ ··· ⊗ a[x_m], where x₁, x₂,..., x_m is the inorder listing of nodes in the subtree rooted at x. Show that the f fields can be properly updated in O(1) time after a rotation. Modify your argument slightly to show that the size fields in order-statistic trees can be maintained in O(1) time per rotation.

Exercises 14.2-5: ⋆

We wish to augment red-black trees with an operation RB-ENUMERATE(x, a, b) that outputs all the keys k such that a ≤ k ≤ b in a red-black tree rooted at x. Describe how RB-ENUMERATE can be implemented in Θ(m +lg n) time, where m is the number of keys that are output and n is the number of internal nodes in the tree. (Hint: There is no need to add new fields to the red-black tree.)