LAQSHYA Institute of Technology & Sciences: Binary Search Trees (IT Department @ LAQSHYA)

Binary Search Trees

Lets look at trees that are (1) binary and (2) ordered.
We will use these trees to store some values (in a computer's memory, I assume).

Each vertex will contain one of whatever data we're storing.
I'm assuming we're organizing our data by one value: its key.
The keys must be totally-ordered.
Let's assume all the keys are distinct for simplicity.
The key might be something like a student number or name: the thing we want to search by.

That is, it will look like this:

typedef struct bst_node {

struct bst_node *leftchild;

struct bst_node *rightchild;

char *key;

char *otherdata;

} bst_node;

We'll make a few rules about how we arrange the values in the tree:

The keys in the left subtree must be less than the key of the root.
The keys in the right subtree must be greater than the key of the root.
Those must also be true for each subtree.

A tree like this is called a binary search tree or BST.
For example, this is a binary search tree containing some words (ordered by dictionary ordering):
This is another BST containing the same values:

We haven't made any assertions about the BST being balanced or full.

It is fairly straightforward to insert a new value into a BST and maintain the rules of the tree.

The idea: look at the keys to decide if we go to the left or right.
If the child spot is free, put it there.
If not, recurse down the tree.

procedure bst_insert(bst_root, newnode)

if newnode.key < bst_root.key // insert on the left

if bst_root.leftchild = null

bst_root.leftchild = newnode

else

bst_insert(bst_root.leftchild, newnode)

else // insert on the right

if bst_root.rightchild = null

bst_root.leftchild = newnode

else

bst_insert(bst_root.rightchild, newnode)

For example, if we start with the first BST above:

… and insert “eggplant”, then we call the function with the root of the tree and…
“eggplant” > “date”, and the right child is non-null. Recurse on the right child, “grape”.
“eggplant” < “grape”, and the left child is non-null. Recurse on the left child, “fig”.
“eggplant” < “fig”, but the right child is null. Insert the new node as the right child.

After that, we get this:

Still a BST.
Still balanced, but that's not always going to be true.
If we now insert “elderberry”, it will be unbalanced

The running time of the insertion?

We do constant work and recurse (at most) once per level of the tree.
Running time is \(O(h)\).
So, it would be nice to keep \(h\) under control.

We know that if we have a binary tree with \(n\) vertices that is full and balanced, it has height of \(\Theta(\log_2 n)\).

But that insertion algorithm doesn't get us balanced trees.

For example, if we start with an empty tree and insert the values in-order, we build the trees like this:
Those are as unbalanced as they can be: \(h=n\).

So, further insertions take a long time.
Any other algorithms that traverse the height of the tree will be slow too.

One solution that will probably work: insert the values in a random order.

For example, I randomly permuted the words and got: date, fig, apple, grape, banana, eggplant, cherry, honeydew.
Inserting in that order gives us this tree:
\(h=3\) is pretty good.
Of course, we could have been unlucky and got a large \(h\).
But for large \(n\), the probability of getting a really-unbalanced tree is quite small.

Once we have a BST, we can search for values in the collection easily:

procedure bst_search(bst_root, key)

if bst_root = null // it's not there

return null

else if key = bst_root.key // found it

return bst_root

else if key < bst_root.key // look on the left

return bst_search(bst_root.leftchild, key)

else // look on the right

return bst_search(bst_root.rightchild, key)

Like insertion, running time is \(O(h)\) for the search.
It's looking even more important that we keep \(h\) as close to \(\log_2 n\) as we can.

Another solution to imbalanced trees: be more clever in the insertion and re-balance the tree if it gets too out-of-balance.

But that's tricky.

If we have a badly-balanced BST, we can do something to re-balance.

We can “pivot” around an edge to make one side higher/shorter than it was:
There, \(A,B,C\) represent subtrees.
Both before and after, the tree represents values with \(A<1<B<2<C\).
So, if it was a BST before, it will be after.
If \(A\) was too short or \(C\) too tall on the left, they will be better on the right.

The problem is that we might have to do a lot of these to get the BST back in-balance.

… making insertion slow because of the maintenance work.
How to overcome that is a problem for another course.

But if we do keep our tree mostly-balanced (\(h=O(\log n)\)) then we can search in \(\log n\) time.

As long as we don't take too long to insert, it starts to sound better than a sorted list.
In a sorted list, we can search in \(O(\log n)\) (binary search) but insertion takes \(O(n)\) since we have to make space in the array.

Remember the problem you're trying to solve with data structures like this: we want to store some values so that we can quickly…

1. insert a new value into the collection;

2. look up a value by its key (and get the other data in the structure);

3. delete values from the collection.

A sorted list is fast at 2, but slow at 1 and 3.

The BST is usually fast at all three, but slow at all three if the tree is unbalanced.

----------------------------------------------------------------------------

Article By

-----------------

IT Departement

LAQSHYA Institute of Technology & Sciences

LAQSHYA Institute of Technology & Sciences

Sunday, May 25, 2014

Binary Search Trees (IT Department @ LAQSHYA)

No comments:

Post a Comment