Saturday, October 24, 2009

Binary Tree Iterators










In my last post, I presented code (and unit tests) for a binary tree implementation in Java. I alluded to the fact that there was a depth-first iterator in one of the test classes. This time I'll discuss iterators and present more practice code.

Iterator is one of the behavioral patterns described by the Gang of Four in their classic "Design Patterns". It provides the means for walking through a data structure without having to expose the details of how it's done.

Timothy Budd provides an excellent explanation of binary trees and iterators in chapter 10 of his "Classic Data Structures In C++".

Binary trees are interesting data structures. If the n nodes in a binary tree are independent, then there are n! = n*(n-1)*...*2*1 different orderings by which one could visit every node.

But the fact that the tree is arranged so that each parent has, at most, a left and right child limits the number of choices for walking the tree to six:

  1. Process value, then left child, then right child
  2. Process left child, then value, then right child
  3. Process left child, then right child, then value
  4. Process value, then right child, then left child
  5. Process right child, then value, then left child
  6. Process right child, then left child, then value


Subtrees are usually traversed from left to right, so the first three possibilities are the most common. Each is given a name that may be more familiar and memorable. The first is called preorder or depth-first traversal; the second in-order or symmetric traversal; and the third post-order traversal. There is also level order or breadth-first traversal, where all the nodes at one level are visited before proceeding to the next.

Java has an Iterator interface in its java.util package that defines the methods that all classes that implement it must provide. The next() returns a generic value. For my binary tree iterator I knew there'd be times when I wanted to get the next Node instead of the value, so I extended the Iterator interface and added a next method that returned a Node<T>:

package tree;

import java.util.Iterator;

public interface BinaryTreeIterator<T extends Comparable<T>> extends Iterator<T>
{
Node<T> nextNode();
}


Then I created an abstract class that provided default behavior for all methods except the ones that provided the next Node value and whether or not the walk through the tree was complete:

package tree;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import java.util.Iterator;

public abstract class AbstractBinaryTreeIterator<T extends Comparable<T>> implements BinaryTreeIterator<T>
{
public static final Log LOGGER = LogFactory.getLog(AbstractBinaryTreeIterator.class);

public void remove()
{
throw new UnsupportedOperationException("cannot remove from a binary tree");
}

public T next()
{
Node<T> nextNode = nextNode();

return nextNode.getValue();
}
}


The post-order traversal examines the left child first, then the right child, then the value in a recursive manner. It uses a last in, first out (LIFO) stack to hold the nodes in the opposite order they are to be visited. The iteration starts by pushing the root node onto the stack and recursing down the tree:

package tree;

import java.util.NoSuchElementException;
import java.util.Stack;

public class PostOrderIterator<T extends Comparable<T>> extends AbstractBinaryTreeIterator<T>
{
protected Stack<Node<T>> stack = new Stack<Node<T>>();

public PostOrderIterator(BinaryTree<T> tree)
{
init(tree.getRoot());
}

public void init(Node<T> root)
{
if (root != null)
{
stack.clear();
stackChildren(root);
}
}

private void stackChildren(Node<T> node)
{
stack.push(node);
Node<T> next = node.getRight();
if (next != null)
{
stackChildren(next);
}
next = node.getLeft();
if (next != null)
{
stackChildren(next);
}
}

public Node<T> nextNode()
{
if (!hasNext())
{
throw new NoSuchElementException();
}

Node<T> x = null;

if (!stack.empty())
{
if (LOGGER.isDebugEnabled())
{
LOGGER.debug(stack);
}

x = stack.pop();
}

return x;
}

public boolean hasNext()
{
return !stack.isEmpty();
}
}


Of course there are unit tests:

package tree;

import org.springframework.test.context.ContextConfiguration;
import static org.testng.Assert.assertFalse;
import static org.testng.AssertJUnit.assertEquals;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.NoSuchElementException;

@Test
@ContextConfiguration(locations = "classpath:app-context.xml, classpath:app-context-test.xml")
public class BinaryTreeIteratorTest
{
public void testHasNextEmptyTree()
{
BinaryTree<String> empty = new BinaryTree<String>();
AbstractBinaryTreeIterator<String> iterator = new DepthFirstIterator<String>(empty);

assertFalse(iterator.hasNext());
}

@Test(expectedExceptions = UnsupportedOperationException.class)
public void testRemove()
{
BinaryTree<String> empty = new BinaryTree<String>();
AbstractBinaryTreeIterator<String> iterator = new DepthFirstIterator<String>(empty);
iterator.remove();
}

public void testNextDepthFirst()
{
Integer[] data = {5, 3, 9, 1, 4, 6,};
BinaryTree<Integer> tree = new BinaryTree<Integer>();
tree.insert(data);

String expected = "5,3,1,4,9,6,";
AbstractBinaryTreeIterator<Integer> iterator = new DepthFirstIterator<Integer>(tree);
String actual = createCommaSeparatedString(iterator);

assertEquals(actual.toString(), expected);
}

public void testNextPostOrder()
{
Integer[] data = {5, 3, 9, 1, 4, 6,};
BinaryTree<Integer> tree = new BinaryTree<Integer>();
tree.insert(data);

String expected = "1,4,3,6,9,5,";
AbstractBinaryTreeIterator<Integer> iterator = new PostOrderIterator<Integer>(tree);
String actual = createCommaSeparatedString(iterator);

assertEquals(actual.toString(), expected);
}

private static String createCommaSeparatedString(Iterator iterator)
{
StringBuffer actual = new StringBuffer(1024);
while (iterator.hasNext())
{
actual.append(iterator.next()).append(',');
}

return actual.toString();
}

@DataProvider(name = "emptyTreeIterators")
public Object[][] createEmptyTreeIterators()
{
BinaryTree<String> tree = new BinaryTree<String>();

return new Object[][]{
{new BreadthFirstIterator<String>(tree)},
{new DepthFirstIterator<String>(tree)},
{new InOrderIterator<String>(tree)},
{new PostOrderIterator<String>(tree)},
};
}


@Test(expectedExceptions = NoSuchElementException.class, dataProvider = "emptyTreeIterators")
public void testNextEmptyTree(AbstractBinaryTreeIterator<String> iterator)
{
iterator.next();
}

public void testNext()
{
String[] data = {"F", "B", "A", "D", "C", "E", "G", "I", "H",};
BinaryTree<String> tree = new BinaryTree<String>();
tree.insert(data);

Map<String, String> expected = new HashMap<String, String>();
expected.put("depth-first", "F,B,A,D,C,E,G,I,H,");
expected.put("in-order", "A,B,C,D,E,F,G,H,I,");
expected.put("post-order", "A,C,E,D,B,H,I,G,F,");
expected.put("breadth-first", "F,B,G,A,D,I,C,E,H,");

String name = "depth-first";
AbstractBinaryTreeIterator<String> iterator = new DepthFirstIterator<String>(tree);
AbstractBinaryTreeIterator.LOGGER.debug(name);
assertEquals(name, expected.get(name), createCommaSeparatedString(iterator));

name = "in-order";
iterator = new InOrderIterator<String>(tree);
AbstractBinaryTreeIterator.LOGGER.debug(name);
assertEquals(name, expected.get(name), createCommaSeparatedString(iterator));

name = "post-order";
iterator = new PostOrderIterator<String>(tree);
AbstractBinaryTreeIterator.LOGGER.debug(name);
assertEquals(name, expected.get(name), createCommaSeparatedString(iterator));

name = "breadth-first";
iterator = new BreadthFirstIterator<String>(tree);
AbstractBinaryTreeIterator.LOGGER.debug(name);
assertEquals(name, expected.get(name), createCommaSeparatedString(iterator));
}
}


Tests are great, but being a visual person I like to be able to see a picture. There's a terrific package called graphviz from AT&T that lets you generate graph plots in an elegant way. I wrote a quick program to walk the binary tree { F,B,A,D,C,E,G,I,H } using an iterator and spit out the dot representation, like so:

digraph simple_hierarchy {
F->B [label="L"]
F->G [label="R"]
B->A [label="L"]
B->D [label="R"]
D->C [label="L"]
D->E [label="R"]
G->I [label="R"]
I->H [label="L"]
}


I can use this as the input to dot.exe to render the tree as .png or .svg file:



I'll post the other iterators next time.

All this work just to answer a single interview question! This tells me that doing it justice would be hard in such a short period of time. I would flunk such an interview.

Thanks to my friend Steve Roach, who was the first person to bring graphviz (and so many other things) to my attention. He's one of the most talented developers I've had the pleasure of working with, but it's his teaching ability that's his greatest strength. He's one of those people who makes everyone around him better. Such intellectual generosity is rare indeed.

3 comments:

Pete Kirkham said...

In my experience, the main reason for asking the question is to see whether the candidate understands recursion and the visitor pattern. The candidate is told to assume the tree structure's interface, so you should give the code you wrote in your last post to them, making the exercise shorter.

The candidate should ask whether the tree is balanced, or how deep the tree is if not. The answer to the question is then five lines of code using the visitor pattern, as long as the depth of the tree is low enough to fit within the bounds of the stack. Balanced trees will always fit in stack.

<V> void depthFirstPostOrder ( Node<V> node, Visitor<V> visitor ) {
  if ( node != null ) {
    depthFirst(node.getLeft());
    depthFirst(node.getRight());
    visitor.visit(node);
  }
}

If the tree is large and unbalanced then either an explicit stack or mutation of the tree is required to store the return path. The mutation approach requires no extra storage but is not thread safe, and requires its fixing up to be put in a finally clause in case the visitor throws. Double dispatch may be combined with the visitor pattern to handle heterogeneous trees.

Anonymous said...

Thanks, Pete. Spot on, as always. "Within the bounds of the stack" is an interesting comment. If tree is very large, I suppose the clean recursive calls would have to be unrolled. Is this a case where the lack of tail recursion in Java would be most keenly missed?

Pete Kirkham said...

Not really - the tree can recurse to either branch, so you still consume stack even if one of the calls is in a tail position and eliminated.