CSI-308›Indexed Files

Database SystemsTopic 17 of 22

Indexed Files

8 minread

1,299words

Intermediatelevel

Indexed Files in Databases

Indexed files refer to the method of organizing files in a way that allows for faster retrieval of data. An index is a data structure that improves the speed of data access by creating a separate structure that maps key values to the locations of corresponding data in a file.

Indexed files are essential for improving query performance, especially when dealing with large databases where searching through unindexed data could be slow and inefficient. Indexed files allow the DBMS to locate records directly without scanning every entry in the database.

1. Basics of Indexed Files

An index works like a look-up table, where:

Keys (e.g., an attribute like EmployeeID or BookTitle) are mapped to locations where corresponding data is stored in the file.
The index allows for fast lookups, inserts, updates, and deletes by maintaining a separate, optimized structure for searching the file.

An index is typically stored in a separate file from the data file. When a query is executed, the DBMS first searches the index file, and then retrieves the actual data from the data file based on the index.

2. Structure of Indexed Files

Indexed files are generally structured in one of the following ways:

Primary Index: The primary index is based on the primary key of the file. It uniquely identifies each record in the database and organizes them in sorted order.
Secondary Index: A secondary index is created on a non-primary attribute (e.g., a column like LastName or Salary). This allows for efficient retrieval of data based on attributes other than the primary key.
Clustered Index: The records in the data file are stored physically in the order of the indexed attribute. A clustered index makes access to data more efficient by reducing the number of disk I/O operations required.
Non-clustered Index: The index stores pointers to the data location (disk block) but does not rearrange the physical order of the data. Non-clustered indexes are typically faster to create, but they might require more disk I/O compared to clustered indexes.

3. Types of Indexed Files

There are several types of indexing techniques, each with specific use cases for optimization. Some common types of indexed files include:

Single-Level Indexes:
- A single-level index contains a list of keys from the data file along with their corresponding pointers to the data locations.
- It is often implemented as a sorted list (like a B-tree or an array), where each index entry points to a corresponding record in the data file.
- Example: A student database with a primary index on StudentID. The index would have pairs like (StudentID, pointer to record).
Multi-Level Indexes:
- A multi-level index is an extension of the single-level index, where an index itself is indexed. This structure allows for a more compact representation of indexes and speeds up searches in large datasets.
- In this type, an additional level of indexing is added to avoid large, unmanageable single-level indexes.
- Multi-level indexes help when the size of a single-level index becomes too large to fit in memory, allowing for better scalability.
B-Tree Index:
- A B-tree (Balanced tree) index is one of the most widely used indexing structures. It is a self-balancing tree structure where each node contains multiple keys and pointers to other nodes, which makes searching, insertion, and deletion operations efficient (O(log n) time complexity).
- B+ Tree is an extension of the B-tree where all data (or record pointers) are stored in the leaf nodes, while the internal nodes only store keys for navigation.
- Usage: B-trees are ideal for situations where records are frequently added and deleted, and there is a need for efficient range queries (e.g., retrieving records between two key values).
Hash Index:
- A hash index uses a hash function to map a key to a location in the file. The hash function computes a hash value for a key, which determines the disk block where the corresponding record will be stored.
- Hashing allows for constant-time complexity (O(1)) for lookups but is typically not suitable for range queries, as the keys are not ordered.
- Usage: Best suited for cases where exact match lookups are required, but range-based searches are not needed.
Bitmap Index:
- A bitmap index represents the presence or absence of a value for a specific attribute with a bitmap (a series of 0s and 1s). Each bit in the bitmap corresponds to a record, and the value of the attribute determines whether the bit is set to 1 or 0.
- Bitmap indexes are particularly efficient for attributes with a low cardinality (few distinct values), such as Boolean or categorical data (e.g., gender, marital status).
- Usage: Ideal for columns with a small number of distinct values, such as gender (Male/Female) or status (Active/Inactive).

4. Working of Indexed Files

Index Creation: The DBMS builds an index over one or more attributes of the data. The index consists of key-value pairs where the key is the indexed attribute (e.g., EmployeeID), and the value is a pointer to the record in the data file.
Index Search: When a query is issued, the DBMS first searches the index to locate the pointers to the data records. After finding the relevant pointers, the DBMS retrieves the corresponding records from the data file.
Insertions and Deletions: When a record is inserted or deleted, the index must be updated. Insertion usually requires adding a new key and pointer, while deletion involves removing the key-pointer pair. Depending on the index type (e.g., B-tree or hash), these operations may involve rebalancing or rehashing.
Example of an Index Search: Suppose we have an index on the EmployeeID attribute in a table of employees:
- If a query requests the employee with EmployeeID = 1024, the DBMS will look up the EmployeeID in the index, find the corresponding pointer to the employee’s data record, and fetch the record.

5. Advantages of Indexed Files

Faster Query Performance: Indexes speed up retrieval operations, particularly for searches, range queries, and exact match lookups.
Efficient Sorting: In certain index types, such as B-trees, sorting operations can be executed efficiently because the records are stored in a sorted order.
Efficient Updates: For certain types of indexes (e.g., B-tree), updates can be done in logarithmic time, which is faster than scanning the entire table for each update.

6. Disadvantages of Indexed Files

Storage Overhead: Indexes consume additional disk space. Each index requires storing key-value pairs, which can lead to increased storage requirements, especially for large databases.
Maintenance Cost: When records are added, updated, or deleted, the indexes must be maintained, which incurs overhead. For example, B-trees need to be rebalanced during insertions and deletions.
Complexity: Managing multiple indexes (primary, secondary, composite) increases the complexity of the database system, especially when choosing the right indexing strategy for different types of queries.

7. Example of SQL Syntax for Creating Indexes

Create a Primary Index on a table's primary key:

CREATE INDEX idx_employee_id
ON Employees (EmployeeID);

Create a Secondary Index on a non-primary attribute:

CREATE INDEX idx_employee_name
ON Employees (LastName, FirstName);

Create a Bitmap Index:

CREATE BITMAP INDEX idx_gender
ON Employees (Gender);

Conclusion

Indexed files are a fundamental part of database systems that significantly improve the performance of queries by providing efficient data retrieval paths. They allow databases to handle large amounts of data and still perform complex operations quickly, such as searching, sorting, and range querying. Different types of indexing techniques (e.g., B-trees, hash indexing, bitmap indexes) are chosen based on the characteristics of the data and the types of queries being executed. While indexes provide significant performance benefits, they also come with storage and maintenance costs, so careful consideration is required in their implementation.

Previous topic 16

Storage and File Structure

Next topic 18

B-Trees

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

CSI-308›Indexed Files

Database SystemsTopic 17 of 22

Indexed Files

8 minread

1,299words

Intermediatelevel

Indexed Files in Databases

1. Basics of Indexed Files

An index works like a look-up table, where:

Keys (e.g., an attribute like EmployeeID or BookTitle) are mapped to locations where corresponding data is stored in the file.
The index allows for fast lookups, inserts, updates, and deletes by maintaining a separate, optimized structure for searching the file.

2. Structure of Indexed Files

Indexed files are generally structured in one of the following ways:

Primary Index: The primary index is based on the primary key of the file. It uniquely identifies each record in the database and organizes them in sorted order.
Secondary Index: A secondary index is created on a non-primary attribute (e.g., a column like LastName or Salary). This allows for efficient retrieval of data based on attributes other than the primary key.
Clustered Index: The records in the data file are stored physically in the order of the indexed attribute. A clustered index makes access to data more efficient by reducing the number of disk I/O operations required.
Non-clustered Index: The index stores pointers to the data location (disk block) but does not rearrange the physical order of the data. Non-clustered indexes are typically faster to create, but they might require more disk I/O compared to clustered indexes.

3. Types of Indexed Files

There are several types of indexing techniques, each with specific use cases for optimization. Some common types of indexed files include:

Single-Level Indexes:
- A single-level index contains a list of keys from the data file along with their corresponding pointers to the data locations.
- It is often implemented as a sorted list (like a B-tree or an array), where each index entry points to a corresponding record in the data file.
- Example: A student database with a primary index on StudentID. The index would have pairs like (StudentID, pointer to record).
Multi-Level Indexes:
- A multi-level index is an extension of the single-level index, where an index itself is indexed. This structure allows for a more compact representation of indexes and speeds up searches in large datasets.
- In this type, an additional level of indexing is added to avoid large, unmanageable single-level indexes.
- Multi-level indexes help when the size of a single-level index becomes too large to fit in memory, allowing for better scalability.
B-Tree Index:
- A B-tree (Balanced tree) index is one of the most widely used indexing structures. It is a self-balancing tree structure where each node contains multiple keys and pointers to other nodes, which makes searching, insertion, and deletion operations efficient (O(log n) time complexity).
- B+ Tree is an extension of the B-tree where all data (or record pointers) are stored in the leaf nodes, while the internal nodes only store keys for navigation.
- Usage: B-trees are ideal for situations where records are frequently added and deleted, and there is a need for efficient range queries (e.g., retrieving records between two key values).
Hash Index:
- A hash index uses a hash function to map a key to a location in the file. The hash function computes a hash value for a key, which determines the disk block where the corresponding record will be stored.
- Hashing allows for constant-time complexity (O(1)) for lookups but is typically not suitable for range queries, as the keys are not ordered.
- Usage: Best suited for cases where exact match lookups are required, but range-based searches are not needed.
Bitmap Index:
- A bitmap index represents the presence or absence of a value for a specific attribute with a bitmap (a series of 0s and 1s). Each bit in the bitmap corresponds to a record, and the value of the attribute determines whether the bit is set to 1 or 0.
- Bitmap indexes are particularly efficient for attributes with a low cardinality (few distinct values), such as Boolean or categorical data (e.g., gender, marital status).
- Usage: Ideal for columns with a small number of distinct values, such as gender (Male/Female) or status (Active/Inactive).

4. Working of Indexed Files

Index Creation: The DBMS builds an index over one or more attributes of the data. The index consists of key-value pairs where the key is the indexed attribute (e.g., EmployeeID), and the value is a pointer to the record in the data file.
Index Search: When a query is issued, the DBMS first searches the index to locate the pointers to the data records. After finding the relevant pointers, the DBMS retrieves the corresponding records from the data file.
Insertions and Deletions: When a record is inserted or deleted, the index must be updated. Insertion usually requires adding a new key and pointer, while deletion involves removing the key-pointer pair. Depending on the index type (e.g., B-tree or hash), these operations may involve rebalancing or rehashing.
Example of an Index Search: Suppose we have an index on the EmployeeID attribute in a table of employees:
- If a query requests the employee with EmployeeID = 1024, the DBMS will look up the EmployeeID in the index, find the corresponding pointer to the employee’s data record, and fetch the record.

5. Advantages of Indexed Files

Faster Query Performance: Indexes speed up retrieval operations, particularly for searches, range queries, and exact match lookups.
Efficient Sorting: In certain index types, such as B-trees, sorting operations can be executed efficiently because the records are stored in a sorted order.
Efficient Updates: For certain types of indexes (e.g., B-tree), updates can be done in logarithmic time, which is faster than scanning the entire table for each update.

6. Disadvantages of Indexed Files

Storage Overhead: Indexes consume additional disk space. Each index requires storing key-value pairs, which can lead to increased storage requirements, especially for large databases.
Maintenance Cost: When records are added, updated, or deleted, the indexes must be maintained, which incurs overhead. For example, B-trees need to be rebalanced during insertions and deletions.
Complexity: Managing multiple indexes (primary, secondary, composite) increases the complexity of the database system, especially when choosing the right indexing strategy for different types of queries.

7. Example of SQL Syntax for Creating Indexes

Create a Primary Index on a table's primary key:

CREATE INDEX idx_employee_id
ON Employees (EmployeeID);

Create a Secondary Index on a non-primary attribute:

CREATE INDEX idx_employee_name
ON Employees (LastName, FirstName);

Create a Bitmap Index:

CREATE BITMAP INDEX idx_gender
ON Employees (Gender);

Conclusion

Previous topic 16

Storage and File Structure

Next topic 18

B-Trees

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.