Binary Search Trees
 Lets look at trees that are (1) binary and (2) ordered.
 We will use these trees to store some values (in a
computer's memory, I assume).
 Each vertex will contain one of whatever data we're
storing.
 I'm assuming we're organizing our data by one value:
its key.
 The keys must be totallyordered.
 Let's assume all the keys are distinct for simplicity.
 The key might be something like a student number or
name: the thing we want to search by.
 That is, it will look like this:
typedef struct bst_node
{
struct bst_node
*leftchild;
struct bst_node
*rightchild;
char *key;
char *otherdata;
} bst_node;

 We'll make a few rules about how we arrange the values
in the tree:
 The keys in the left subtree must be less than the key
of the root.
 The keys in the right subtree must be greater than the
key of the root.
 Those must also be true for each subtree.
 A tree like this is called a binary search tree
or BST.
 For example, this is a binary search tree containing
some words (ordered by dictionary ordering):
 This is another BST containing the same values:
 We haven't made any assertions about the BST being
balanced or full.
 It is fairly straightforward to insert a new value into
a BST and maintain the rules of the tree.
 The idea: look at the keys to decide if we go to the
left or right.
 If the child spot is free, put it there.
 If not, recurse down the tree.
procedure bst_insert(bst_root,
newnode)
if newnode.key < bst_root.key // insert on the left
if bst_root.leftchild = null
bst_root.leftchild = newnode
else
bst_insert(bst_root.leftchild,
newnode)
else
// insert on the right
if bst_root.rightchild = null
bst_root.leftchild = newnode
else
bst_insert(bst_root.rightchild,
newnode)
 For example, if we start with the first BST above:
 … and insert “eggplant”, then we call the function
with the root of the tree and…
 “eggplant” > “date”, and the right child is
nonnull. Recurse on the right child, “grape”.
 “eggplant” < “grape”, and the left child is
nonnull. Recurse on the left child, “fig”.
 “eggplant” < “fig”, but the right child is null.
Insert the new node as the right child.
 After that, we get this:
 Still a BST.
 Still balanced, but that's not always going to be
true.
 If we now insert “elderberry”, it will be unbalanced
 The running time of the insertion?
 We do constant work and recurse (at most) once per
level of the tree.
 Running time is \(O(h)\).
 So, it would be nice to keep \(h\) under control.
 We know that if we have a binary tree with \(n\)
vertices that is full and balanced, it has height of \(\Theta(\log_2 n)\).
 But that insertion algorithm doesn't get us balanced
trees.
 For example, if we start with an empty tree and insert
the values inorder, we build the trees like this:
 Those are as unbalanced as they can be: \(h=n\).
 So, further insertions take a long time.
 Any other algorithms that traverse the height of the
tree will be slow too.
 One solution that will probably work: insert the values
in a random order.
 For example, I randomly permuted the words and got:
date, fig, apple, grape, banana, eggplant, cherry, honeydew.
 Inserting in that order gives us this tree:
 \(h=3\) is pretty good.
 Of course, we could have been unlucky and got a large
\(h\).
 But for large \(n\), the probability of getting a
reallyunbalanced tree is quite small.
 Once we have a BST, we can search for values in the
collection easily:
procedure bst_search(bst_root, key)
if bst_root = null // it's not there
return null
else if key = bst_root.key // found it
return bst_root
else if key < bst_root.key // look on the left
return bst_search(bst_root.leftchild,
key)
else
// look on the right
return bst_search(bst_root.rightchild,
key)
 Like insertion, running time is \(O(h)\) for the
search.
 It's looking even more important that we keep \(h\) as
close to \(\log_2 n\) as we can.
 Another solution to imbalanced trees: be more clever in
the insertion and rebalance the tree if it gets too outofbalance.
 But that's tricky.
 If we have a badlybalanced BST, we can do something to
rebalance.
 We can “pivot” around an edge to make one side
higher/shorter than it was:
 There, \(A,B,C\) represent subtrees.
 Both before and after, the tree represents values with
\(A<1<B<2<C\).
 So, if it was a BST before, it will be after.
 If \(A\) was too short or \(C\) too tall on the left,
they will be better on the right.
 The problem is that we might have to do a lot of these
to get the BST back inbalance.
 … making insertion slow because of the maintenance
work.
 How to overcome that is a problem for another course.
 But if we do keep our tree mostlybalanced (\(h=O(\log
n)\)) then we can search in \(\log n\) time.
 As long as we don't take too long to insert, it starts
to sound better than a sorted list.
 In a sorted list, we can search in \(O(\log n)\)
(binary search) but insertion takes \(O(n)\) since we have to make space
in the array.
 Remember the problem you're trying to solve with data
structures like this: we want to store some values so that we can quickly…
1.
insert a new value into the
collection;
2.
look up a value by its key (and get
the other data in the structure);
3.
delete values from the collection.
A sorted list is fast at 2, but slow
at 1 and 3.
The BST is usually fast at all
three, but slow at all three if the tree is unbalanced.

Article By

IT Departement
LAQSHYA Institute of Technology & Sciences
No comments:
Post a Comment