COMP3138›Tree Based Algorithms and Hashing

Design and Analysis of AlgorithmsTopic 26 of 53

Tree Based Algorithms and Hashing

8 minread

1,333words

Intermediatelevel

Tree-Based Algorithms and Hashing

Both tree-based algorithms and hashing are fundamental techniques in algorithm design and data structures. They are used in various applications, ranging from sorting and searching to storing and retrieving data efficiently. Let's explore these concepts in detail.

Tree-Based Algorithms

A tree is a hierarchical data structure composed of nodes, where each node has a parent-child relationship. Trees are widely used in computer science for their ability to represent hierarchical relationships.

There are various types of trees (e.g., binary trees, binary search trees, AVL trees, B-trees, heap trees), and different algorithms exist for efficiently manipulating these structures. Let's discuss some key tree-based algorithms:

1. Binary Search Tree (BST) Algorithms

A Binary Search Tree (BST) is a binary tree where each node follows the property:

The value of all nodes in the left subtree is smaller than the value of the root node.
The value of all nodes in the right subtree is greater than the value of the root node.

Common Operations on a BST:

Search Operation:
- The search operation in a BST starts at the root and proceeds down the tree.
- At each node, we compare the target value with the node's value. If the target is smaller, we move to the left child; if it's larger, we move to the right child.
- This process continues recursively or iteratively until the value is found or we reach a leaf node.
- Time Complexity: O(log n) in the best case (balanced tree), O(n) in the worst case (unbalanced tree).
Insertion Operation:
- To insert a new node, we traverse the tree from the root, comparing the value to be inserted with the current node's value.
- If the value is smaller, we go to the left child; if it's larger, we go to the right child. When we reach an empty spot (a leaf node), we insert the new node.
- Time Complexity: O(log n) on average, O(n) in the worst case (unbalanced tree).
Deletion Operation:
- To delete a node, three cases arise:
  1. Node has no children (a leaf node): Simply remove it.
  2. Node has one child: Remove the node and link its parent directly to the child.
  3. Node has two children: Find the in-order successor (or in-order predecessor) of the node, replace the node with it, and then delete the successor.
- Time Complexity: O(log n) for balanced trees, O(n) for unbalanced trees.
Traversal Algorithms:
- In-order Traversal: Traverse the left subtree, visit the node, and traverse the right subtree. This gives nodes in ascending order.
- Pre-order Traversal: Visit the node first, then traverse the left and right subtrees.
- Post-order Traversal: Traverse the left and right subtrees first, then visit the node.
- Time Complexity for all traversals: O(n).

Example of Binary Search Tree Insertion:

Consider inserting elements into an initially empty BST:

Insert 10:
   10

Insert 5:
   10
  /
 5

Insert 15:
   10
  /  \
 5   15

Insert 12:
   10
  /  \
 5   15
    /
   12

2. Balanced Trees: AVL Trees and Red-Black Trees

An AVL Tree is a self-balancing binary search tree where the difference between the heights of the left and right subtrees of any node is at most 1. Operations like insertions and deletions maintain this balance by performing rotations.

Rotations: A rotation is a tree transformation used to restore balance in an AVL tree after an insertion or deletion.
- Left Rotation: If a right-heavy tree is unbalanced, perform a left rotation to balance it.
- Right Rotation: If a left-heavy tree is unbalanced, perform a right rotation.
Time Complexity: O(log n) for search, insertion, and deletion, since rotations maintain the tree's balanced height.

A Red-Black Tree is another type of balanced binary search tree where nodes are colored either red or black. The tree maintains a set of properties to ensure that it remains balanced. It allows for efficient insertion and deletion, guaranteeing O(log n) time for all operations.

3. B-Trees and B+ Trees

B-Trees are used in databases and file systems where large amounts of data must be stored efficiently and accessed quickly. B-trees are balanced trees that maintain sorted data and allow searches, insertions, and deletions in O(log n) time.

B+ Tree is a variant of the B-tree where all values are stored in the leaf nodes, and internal nodes store only keys to guide the search process.

Hashing

Hashing is a technique used to map data of arbitrary size (such as strings or integers) to fixed-size values (called hash codes). Hashing is commonly used in data structures like hash tables for efficient data storage and retrieval.

1. Hash Functions

A hash function takes an input (or key) and produces a hash value (or index) used to store the corresponding data in a hash table.

Good Hash Function: A good hash function should have the following properties:
1. Deterministic: The same input should always produce the same output.
2. Uniform Distribution: Hash values should be evenly distributed to avoid clustering.
3. Efficient: The hash function should be fast to compute.

Example of a simple hash function:

int hash(int key) {
    return key % table_size;
}

This function returns the remainder when the key is divided by the table size.

2. Collisions in Hashing

A collision occurs when two different keys produce the same hash value. There are two common methods for handling collisions:

Chaining: Each table entry points to a linked list of keys that hash to the same index. When a collision occurs, the key is added to the linked list at that index.
- Time Complexity for insertion and search: O(1) on average if the hash function is good, but it can degrade to O(n) if many collisions occur.
Open Addressing: Instead of using linked lists, open addressing searches for the next available slot in the hash table when a collision occurs.
- Linear Probing: If a collision occurs at index i, check i+1, i+2, and so on until an empty slot is found.
- Quadratic Probing: Similar to linear probing but with a quadratic increment for the search (e.g., i+1^2, i+2^2).
- Double Hashing: Uses a second hash function to calculate the next probe position.

3. Hash Tables

A hash table is a data structure that uses a hash function to map keys to values. It allows for efficient O(1) average time complexity for insertion, deletion, and search operations.

Load Factor: The load factor of a hash table is the ratio of the number of elements in the table to the table size. As the load factor increases, the performance of the hash table may degrade, requiring resizing.
Resizing: To maintain efficient operations, hash tables may resize when the load factor exceeds a threshold. Typically, the table size is doubled when resizing occurs.

4. Applications of Hashing

Database Indexing: Hashing is used in indexing data in databases for quick access.
Caches: Hashing can be used to store frequently accessed data in caches.
Cryptography: Cryptographic hash functions like MD5, SHA, etc., are used in digital signatures, message integrity, and data verification.

Summary of Tree-Based Algorithms and Hashing

Tree-Based Algorithms:
- Binary Search Trees (BST): Efficient for ordered data, with O(log n) average time for search, insert, and delete operations in balanced trees.
- AVL Trees and Red-Black Trees: Self-balancing trees ensuring O(log n) time complexity for operations.
- B-Trees and B+ Trees: Used in databases for efficient searching and sorting.
Hashing:
- Hash Functions map keys to hash values for quick access.
- Collisions can be handled by chaining or open addressing.
- Hash Tables offer O(1) average time for insertion, search, and deletion, making them useful in applications like databases, caches, and indexing.

Both tree-based algorithms and hashing techniques are foundational for efficient searching, sorting, and storing data in computer science, and they are widely applied in databases, memory management, and various real-time systems.

Previous topic 25

Heap Insertion and Deletion

Next topic 27

Red Black Tree Basics

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

COMP3138›Tree Based Algorithms and Hashing

Design and Analysis of AlgorithmsTopic 26 of 53

Tree Based Algorithms and Hashing

8 minread

1,333words

Intermediatelevel

Tree-Based Algorithms and Hashing

Tree-Based Algorithms

1. Binary Search Tree (BST) Algorithms

A Binary Search Tree (BST) is a binary tree where each node follows the property:

The value of all nodes in the left subtree is smaller than the value of the root node.
The value of all nodes in the right subtree is greater than the value of the root node.

Common Operations on a BST:

Search Operation:
- The search operation in a BST starts at the root and proceeds down the tree.
- At each node, we compare the target value with the node's value. If the target is smaller, we move to the left child; if it's larger, we move to the right child.
- This process continues recursively or iteratively until the value is found or we reach a leaf node.
- Time Complexity: O(log n) in the best case (balanced tree), O(n) in the worst case (unbalanced tree).
Insertion Operation:
- To insert a new node, we traverse the tree from the root, comparing the value to be inserted with the current node's value.
- If the value is smaller, we go to the left child; if it's larger, we go to the right child. When we reach an empty spot (a leaf node), we insert the new node.
- Time Complexity: O(log n) on average, O(n) in the worst case (unbalanced tree).
Deletion Operation:
- To delete a node, three cases arise:
  1. Node has no children (a leaf node): Simply remove it.
  2. Node has one child: Remove the node and link its parent directly to the child.
  3. Node has two children: Find the in-order successor (or in-order predecessor) of the node, replace the node with it, and then delete the successor.
- Time Complexity: O(log n) for balanced trees, O(n) for unbalanced trees.
Traversal Algorithms:
- In-order Traversal: Traverse the left subtree, visit the node, and traverse the right subtree. This gives nodes in ascending order.
- Pre-order Traversal: Visit the node first, then traverse the left and right subtrees.
- Post-order Traversal: Traverse the left and right subtrees first, then visit the node.
- Time Complexity for all traversals: O(n).

Example of Binary Search Tree Insertion:

Consider inserting elements into an initially empty BST:

Insert 10:
   10

Insert 5:
   10
  /
 5

Insert 15:
   10
  /  \
 5   15

Insert 12:
   10
  /  \
 5   15
    /
   12

2. Balanced Trees: AVL Trees and Red-Black Trees

Rotations: A rotation is a tree transformation used to restore balance in an AVL tree after an insertion or deletion.
- Left Rotation: If a right-heavy tree is unbalanced, perform a left rotation to balance it.
- Right Rotation: If a left-heavy tree is unbalanced, perform a right rotation.
Time Complexity: O(log n) for search, insertion, and deletion, since rotations maintain the tree's balanced height.

3. B-Trees and B+ Trees

B+ Tree is a variant of the B-tree where all values are stored in the leaf nodes, and internal nodes store only keys to guide the search process.

Hashing

1. Hash Functions

A hash function takes an input (or key) and produces a hash value (or index) used to store the corresponding data in a hash table.

Good Hash Function: A good hash function should have the following properties:
1. Deterministic: The same input should always produce the same output.
2. Uniform Distribution: Hash values should be evenly distributed to avoid clustering.
3. Efficient: The hash function should be fast to compute.

Example of a simple hash function:

int hash(int key) {
    return key % table_size;
}

This function returns the remainder when the key is divided by the table size.

2. Collisions in Hashing

A collision occurs when two different keys produce the same hash value. There are two common methods for handling collisions:

Chaining: Each table entry points to a linked list of keys that hash to the same index. When a collision occurs, the key is added to the linked list at that index.
- Time Complexity for insertion and search: O(1) on average if the hash function is good, but it can degrade to O(n) if many collisions occur.
Open Addressing: Instead of using linked lists, open addressing searches for the next available slot in the hash table when a collision occurs.
- Linear Probing: If a collision occurs at index i, check i+1, i+2, and so on until an empty slot is found.
- Quadratic Probing: Similar to linear probing but with a quadratic increment for the search (e.g., i+1^2, i+2^2).
- Double Hashing: Uses a second hash function to calculate the next probe position.

3. Hash Tables

Load Factor: The load factor of a hash table is the ratio of the number of elements in the table to the table size. As the load factor increases, the performance of the hash table may degrade, requiring resizing.
Resizing: To maintain efficient operations, hash tables may resize when the load factor exceeds a threshold. Typically, the table size is doubled when resizing occurs.

4. Applications of Hashing

Database Indexing: Hashing is used in indexing data in databases for quick access.
Caches: Hashing can be used to store frequently accessed data in caches.
Cryptography: Cryptographic hash functions like MD5, SHA, etc., are used in digital signatures, message integrity, and data verification.

Summary of Tree-Based Algorithms and Hashing

Tree-Based Algorithms:
- Binary Search Trees (BST): Efficient for ordered data, with O(log n) average time for search, insert, and delete operations in balanced trees.
- AVL Trees and Red-Black Trees: Self-balancing trees ensuring O(log n) time complexity for operations.
- B-Trees and B+ Trees: Used in databases for efficient searching and sorting.
Hashing:
- Hash Functions map keys to hash values for quick access.
- Collisions can be handled by chaining or open addressing.
- Hash Tables offer O(1) average time for insertion, search, and deletion, making them useful in applications like databases, caches, and indexing.

Previous topic 25

Heap Insertion and Deletion

Next topic 27

Red Black Tree Basics

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.