Database SystemsTopic 18 of 22

B-Trees

8 minread

1,305words

Intermediatelevel

B-Trees: Overview, Structure, and Operations

A B-tree (short for Balanced Tree) is a self-balancing search tree that maintains sorted data and allows for efficient insertion, deletion, and search operations. B-trees are commonly used in databases and file systems because of their ability to keep data sorted while minimizing disk I/O operations.

B-trees are widely used for indexing because they provide logarithmic time complexity for search, insertion, and deletion, which is essential for handling large datasets efficiently.

1. B-Tree Structure

A B-tree is a type of multi-way tree where each node can have more than two children. It is balanced, meaning all leaf nodes are at the same level, ensuring that the path from the root to any leaf node has the same length.

Key properties of a B-tree:

Nodes: Each node in a B-tree contains keys (data values) and pointers (to child nodes or actual data records). The keys are arranged in sorted order within each node.
Order of the Tree (m): The order of the B-tree determines the maximum number of children a node can have. A B-tree of order m has the following properties:
- Each internal node can have up to m children.
- Each internal node must have at least ⌈m/2⌉ children (except for the root node).
- Each node (except the root) must have at least ⌈m/2⌉ - 1 keys.
- The root node must have at least 1 key if it is not a leaf node.
Leaf Nodes: Leaf nodes are the nodes that do not have any children, and they contain actual data or pointers to the data.

2. B-Tree Insertion

Insertion in a B-tree is designed to maintain the properties of the tree while ensuring that the data is sorted and balanced. The general steps for insertion are:

Start at the root node: Compare the key to be inserted with the keys in the current node. Depending on the comparison, move to the appropriate child node.
Traverse down the tree: Continue comparing and moving down the tree until you reach a leaf node.
Insert the key: Insert the key into the leaf node in sorted order.
- If the leaf node has enough space, simply add the key.
- If the leaf node is full (i.e., it has m-1 keys), it must be split.
Node Split:
- If the node is full, split it into two nodes, with the median key being promoted to the parent node.
- This split might propagate up the tree if the parent node also becomes full.
Repeat the process: If the split occurs at the root node, a new root node is created, which increases the height of the tree.

3. B-Tree Deletion

Deletion from a B-tree ensures that the tree remains balanced after the key is removed. The general steps for deletion are:

Find the key: Start at the root and search for the key to be deleted, just as you would for an insertion.
Leaf node deletion:
- If the key is in a leaf node, simply remove the key.
- If the key is in an internal node, find its predecessor (the largest key in the left subtree) or successor (the smallest key in the right subtree) and replace it with the key to be deleted. Then, delete the predecessor or successor key from the leaf node.
Rebalance the tree:
- If a node (either internal or leaf) becomes underfull (i.e., it has fewer than ⌈m/2⌉ keys), rebalancing must occur. This can be done in one of three ways:
  - Borrowing: If a sibling node has more than the minimum number of keys, a key can be borrowed from the sibling.
  - Merging: If neither sibling has extra keys to borrow, merge the node with a sibling and push a key down from the parent to the merged node.
- If the root node becomes underfull and has only one child, the root is removed, and its child becomes the new root, decreasing the height of the tree.

4. B-Tree Search

Searching for a key in a B-tree is very efficient because the tree is balanced and the keys within each node are sorted. The search process works as follows:

Start at the root: Begin at the root node and compare the search key to the keys in the node.
Traverse to the correct child: Depending on the comparison result:
- If the search key is less than the first key in the node, follow the first child pointer.
- If the search key is between two keys, follow the child pointer corresponding to that range of keys.
- Continue this process recursively down the tree, moving through child nodes until the search reaches a leaf node.
Key found or not: If the key is found in a node, return the associated data or pointer. If the key is not found by the time the search reaches a leaf node, the key is not present in the tree.

5. B-Tree Example

Let’s take an example of a B-tree of order 3 (m = 3), where each node can have a maximum of 2 keys and 3 children.

Initial Tree (Empty tree):
```
[ ]
```
Insert 10:
```
[10]
```
Insert 20:
```
[10, 20]
```
Insert 5 (no split needed):
```
[5, 10, 20]
```

Insert 30 (causes split):

      [10]
     /    \
 [5]       [20, 30]

Insert 15 (insert into right child):

      [10]
     /    \
 [5]     [15, 20, 30]

After splitting the right child:

      [10, 20]
     /   |   \
 [5]   [15]   [30]

6. Advantages of B-Trees

Efficient Searching: B-trees allow searching for a key in O(log n) time, which is significantly faster than a linear search on unsorted data.
Balanced: B-trees automatically maintain balance during insertions and deletions, ensuring that the height of the tree remains low, even with large amounts of data.
Optimal for Disk Storage: B-trees are designed to minimize disk I/O operations by ensuring that nodes are large enough to fill entire disk blocks. This reduces the number of disk accesses required for operations.
Support for Range Queries: Since B-trees store keys in sorted order, they can efficiently support range queries (e.g., finding all records between two key values).

7. Disadvantages of B-Trees

Complexity: The algorithms for insertion, deletion, and rebalancing are more complex than simpler data structures like binary search trees (BSTs).
Storage Overhead: B-trees require more memory for storing pointers and maintaining balanced nodes, which could lead to increased storage requirements, especially for large datasets.

8. B-Tree Variants

B+ Tree: In a B+ tree, all data records are stored at the leaf level, and internal nodes only store keys for navigation. This allows for more efficient range queries and better use of caching. Most modern database systems use B+ trees for indexing.
B Tree*: A B* tree is a variant of the B-tree where nodes are more densely packed by redistributing keys between neighboring nodes. It reduces the number of splits during insertions and increases the utilization of nodes.

9. Use Cases of B-Trees

Database Indexing: B-trees are commonly used for indexing in relational databases. They support fast search, insert, delete, and range queries, making them ideal for database management systems.
File Systems: Many file systems use B-trees to index file names and directories for quick access to files.
Data Warehousing: B-trees are used in OLAP (Online Analytical Processing) systems where fast range queries and efficient indexing are crucial.

Conclusion

B-trees are powerful and efficient data structures used in database systems for indexing purposes. Their ability to maintain balanced and sorted data while supporting fast search, insert, and delete operations makes them ideal for applications that require large-scale data management. By using B-trees (or their variants, like B+ trees), database systems can efficiently store and retrieve large datasets, ensuring high performance in search and range query operations.

Previous topic 17

Indexed Files

Next topic 19

Files with Dense Index

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

CSI-308›B-Trees

Database SystemsTopic 18 of 22

B-Trees

8 minread

1,305words

Intermediatelevel

B-Trees: Overview, Structure, and Operations

B-trees are widely used for indexing because they provide logarithmic time complexity for search, insertion, and deletion, which is essential for handling large datasets efficiently.

1. B-Tree Structure

Key properties of a B-tree:

Nodes: Each node in a B-tree contains keys (data values) and pointers (to child nodes or actual data records). The keys are arranged in sorted order within each node.
Order of the Tree (m): The order of the B-tree determines the maximum number of children a node can have. A B-tree of order m has the following properties:
- Each internal node can have up to m children.
- Each internal node must have at least ⌈m/2⌉ children (except for the root node).
- Each node (except the root) must have at least ⌈m/2⌉ - 1 keys.
- The root node must have at least 1 key if it is not a leaf node.
Leaf Nodes: Leaf nodes are the nodes that do not have any children, and they contain actual data or pointers to the data.

2. B-Tree Insertion

Insertion in a B-tree is designed to maintain the properties of the tree while ensuring that the data is sorted and balanced. The general steps for insertion are:

Start at the root node: Compare the key to be inserted with the keys in the current node. Depending on the comparison, move to the appropriate child node.
Traverse down the tree: Continue comparing and moving down the tree until you reach a leaf node.
Insert the key: Insert the key into the leaf node in sorted order.
- If the leaf node has enough space, simply add the key.
- If the leaf node is full (i.e., it has m-1 keys), it must be split.
Node Split:
- If the node is full, split it into two nodes, with the median key being promoted to the parent node.
- This split might propagate up the tree if the parent node also becomes full.
Repeat the process: If the split occurs at the root node, a new root node is created, which increases the height of the tree.

3. B-Tree Deletion

Deletion from a B-tree ensures that the tree remains balanced after the key is removed. The general steps for deletion are:

Find the key: Start at the root and search for the key to be deleted, just as you would for an insertion.
Leaf node deletion:
- If the key is in a leaf node, simply remove the key.
- If the key is in an internal node, find its predecessor (the largest key in the left subtree) or successor (the smallest key in the right subtree) and replace it with the key to be deleted. Then, delete the predecessor or successor key from the leaf node.
Rebalance the tree:
- If a node (either internal or leaf) becomes underfull (i.e., it has fewer than ⌈m/2⌉ keys), rebalancing must occur. This can be done in one of three ways:
  - Borrowing: If a sibling node has more than the minimum number of keys, a key can be borrowed from the sibling.
  - Merging: If neither sibling has extra keys to borrow, merge the node with a sibling and push a key down from the parent to the merged node.
- If the root node becomes underfull and has only one child, the root is removed, and its child becomes the new root, decreasing the height of the tree.

4. B-Tree Search

Searching for a key in a B-tree is very efficient because the tree is balanced and the keys within each node are sorted. The search process works as follows:

Start at the root: Begin at the root node and compare the search key to the keys in the node.
Traverse to the correct child: Depending on the comparison result:
- If the search key is less than the first key in the node, follow the first child pointer.
- If the search key is between two keys, follow the child pointer corresponding to that range of keys.
- Continue this process recursively down the tree, moving through child nodes until the search reaches a leaf node.
Key found or not: If the key is found in a node, return the associated data or pointer. If the key is not found by the time the search reaches a leaf node, the key is not present in the tree.

5. B-Tree Example

Let’s take an example of a B-tree of order 3 (m = 3), where each node can have a maximum of 2 keys and 3 children.

Initial Tree (Empty tree):
```
[ ]
```
Insert 10:
```
[10]
```
Insert 20:
```
[10, 20]
```
Insert 5 (no split needed):
```
[5, 10, 20]
```

Insert 30 (causes split):

      [10]
     /    \
 [5]       [20, 30]

Insert 15 (insert into right child):

      [10]
     /    \
 [5]     [15, 20, 30]

After splitting the right child:

      [10, 20]
     /   |   \
 [5]   [15]   [30]

6. Advantages of B-Trees

Efficient Searching: B-trees allow searching for a key in O(log n) time, which is significantly faster than a linear search on unsorted data.
Balanced: B-trees automatically maintain balance during insertions and deletions, ensuring that the height of the tree remains low, even with large amounts of data.
Optimal for Disk Storage: B-trees are designed to minimize disk I/O operations by ensuring that nodes are large enough to fill entire disk blocks. This reduces the number of disk accesses required for operations.
Support for Range Queries: Since B-trees store keys in sorted order, they can efficiently support range queries (e.g., finding all records between two key values).

7. Disadvantages of B-Trees

Complexity: The algorithms for insertion, deletion, and rebalancing are more complex than simpler data structures like binary search trees (BSTs).
Storage Overhead: B-trees require more memory for storing pointers and maintaining balanced nodes, which could lead to increased storage requirements, especially for large datasets.

8. B-Tree Variants

B+ Tree: In a B+ tree, all data records are stored at the leaf level, and internal nodes only store keys for navigation. This allows for more efficient range queries and better use of caching. Most modern database systems use B+ trees for indexing.
B Tree*: A B* tree is a variant of the B-tree where nodes are more densely packed by redistributing keys between neighboring nodes. It reduces the number of splits during insertions and increases the utilization of nodes.

9. Use Cases of B-Trees

Database Indexing: B-trees are commonly used for indexing in relational databases. They support fast search, insert, delete, and range queries, making them ideal for database management systems.
File Systems: Many file systems use B-trees to index file names and directories for quick access to files.
Data Warehousing: B-trees are used in OLAP (Online Analytical Processing) systems where fast range queries and efficient indexing are crucial.

Conclusion

Previous topic 17

Indexed Files

Next topic 19

Files with Dense Index

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.