Binary Search Trees
- Lets look at trees that are (1) binary and (2) ordered.
- We will use these trees to store some values (in a
computer's memory, I assume).
- Each vertex will contain one of whatever data we're
storing.
- I'm assuming we're organizing our data by one value:
its key.
- The keys must be totally-ordered.
- Let's assume all the keys are distinct for simplicity.
- The key might be something like a student number or
name: the thing we want to search by.
- That is, it will look like this:
typedef struct bst_node
{
struct bst_node
*leftchild;
struct bst_node
*rightchild;
char *key;
char *otherdata;
} bst_node;
|
- We'll make a few rules about how we arrange the values
in the tree:
- The keys in the left subtree must be less than the key
of the root.
- The keys in the right subtree must be greater than the
key of the root.
- Those must also be true for each subtree.
- A tree like this is called a binary search tree
or BST.
- For example, this is a binary search tree containing
some words (ordered by dictionary ordering):
- This is another BST containing the same values:
- We haven't made any assertions about the BST being
balanced or full.
- It is fairly straightforward to insert a new value into
a BST and maintain the rules of the tree.
- The idea: look at the keys to decide if we go to the
left or right.
- If the child spot is free, put it there.
- If not, recurse down the tree.
procedure bst_insert(bst_root,
newnode)
if newnode.key < bst_root.key // insert on the left
if bst_root.leftchild = null
bst_root.leftchild = newnode
else
bst_insert(bst_root.leftchild,
newnode)
else
// insert on the right
if bst_root.rightchild = null
bst_root.leftchild = newnode
else
bst_insert(bst_root.rightchild,
newnode)
- For example, if we start with the first BST above:
- … and insert “eggplant”, then we call the function
with the root of the tree and…
- “eggplant” > “date”, and the right child is
non-null. Recurse on the right child, “grape”.
- “eggplant” < “grape”, and the left child is
non-null. Recurse on the left child, “fig”.
- “eggplant” < “fig”, but the right child is null.
Insert the new node as the right child.
- After that, we get this:
- Still a BST.
- Still balanced, but that's not always going to be
true.
- If we now insert “elderberry”, it will be unbalanced
- The running time of the insertion?
- We do constant work and recurse (at most) once per
level of the tree.
- Running time is \(O(h)\).
- So, it would be nice to keep \(h\) under control.
- We know that if we have a binary tree with \(n\)
vertices that is full and balanced, it has height of \(\Theta(\log_2 n)\).
- But that insertion algorithm doesn't get us balanced
trees.
- For example, if we start with an empty tree and insert
the values in-order, we build the trees like this:
- Those are as unbalanced as they can be: \(h=n\).
- So, further insertions take a long time.
- Any other algorithms that traverse the height of the
tree will be slow too.
- One solution that will probably work: insert the values
in a random order.
- For example, I randomly permuted the words and got:
date, fig, apple, grape, banana, eggplant, cherry, honeydew.
- Inserting in that order gives us this tree:
- \(h=3\) is pretty good.
- Of course, we could have been unlucky and got a large
\(h\).
- But for large \(n\), the probability of getting a
really-unbalanced tree is quite small.
- Once we have a BST, we can search for values in the
collection easily:
procedure bst_search(bst_root, key)
if bst_root = null // it's not there
return null
else if key = bst_root.key // found it
return bst_root
else if key < bst_root.key // look on the left
return bst_search(bst_root.leftchild,
key)
else
// look on the right
return bst_search(bst_root.rightchild,
key)
- Like insertion, running time is \(O(h)\) for the
search.
- It's looking even more important that we keep \(h\) as
close to \(\log_2 n\) as we can.
- Another solution to imbalanced trees: be more clever in
the insertion and re-balance the tree if it gets too out-of-balance.
- But that's tricky.
- If we have a badly-balanced BST, we can do something to
re-balance.
- We can “pivot” around an edge to make one side
higher/shorter than it was:
- There, \(A,B,C\) represent subtrees.
- Both before and after, the tree represents values with
\(A<1<B<2<C\).
- So, if it was a BST before, it will be after.
- If \(A\) was too short or \(C\) too tall on the left,
they will be better on the right.
- The problem is that we might have to do a lot of these
to get the BST back in-balance.
- … making insertion slow because of the maintenance
work.
- How to overcome that is a problem for another course.
- But if we do keep our tree mostly-balanced (\(h=O(\log
n)\)) then we can search in \(\log n\) time.
- As long as we don't take too long to insert, it starts
to sound better than a sorted list.
- In a sorted list, we can search in \(O(\log n)\)
(binary search) but insertion takes \(O(n)\) since we have to make space
in the array.
- Remember the problem you're trying to solve with data
structures like this: we want to store some values so that we can quickly…
1.
insert a new value into the
collection;
2.
look up a value by its key (and get
the other data in the structure);
3.
delete values from the collection.
A sorted list is fast at 2, but slow
at 1 and 3.
The BST is usually fast at all
three, but slow at all three if the tree is unbalanced.
----------------------------------------------------------------------------
Article By
-----------------
IT Departement
LAQSHYA Institute of Technology & Sciences
No comments:
Post a Comment