Indexed files refer to the method of organizing files in a way that allows for faster retrieval of data. An index is a data structure that improves the speed of data access by creating a separate structure that maps key values to the locations of corresponding data in a file.
Indexed files are essential for improving query performance, especially when dealing with large databases where searching through unindexed data could be slow and inefficient. Indexed files allow the DBMS to locate records directly without scanning every entry in the database.
An index works like a look-up table, where:
EmployeeID or BookTitle) are mapped to locations where corresponding data is stored in the file.An index is typically stored in a separate file from the data file. When a query is executed, the DBMS first searches the index file, and then retrieves the actual data from the data file based on the index.
Indexed files are generally structured in one of the following ways:
Primary Index: The primary index is based on the primary key of the file. It uniquely identifies each record in the database and organizes them in sorted order.
Secondary Index: A secondary index is created on a non-primary attribute (e.g., a column like LastName or Salary). This allows for efficient retrieval of data based on attributes other than the primary key.
Clustered Index: The records in the data file are stored physically in the order of the indexed attribute. A clustered index makes access to data more efficient by reducing the number of disk I/O operations required.
Non-clustered Index: The index stores pointers to the data location (disk block) but does not rearrange the physical order of the data. Non-clustered indexes are typically faster to create, but they might require more disk I/O compared to clustered indexes.
There are several types of indexing techniques, each with specific use cases for optimization. Some common types of indexed files include:
Single-Level Indexes:
StudentID. The index would have pairs like (StudentID, pointer to record).Multi-Level Indexes:
B-Tree Index:
Hash Index:
Bitmap Index:
gender (Male/Female) or status (Active/Inactive).Index Creation: The DBMS builds an index over one or more attributes of the data. The index consists of key-value pairs where the key is the indexed attribute (e.g., EmployeeID), and the value is a pointer to the record in the data file.
Index Search: When a query is issued, the DBMS first searches the index to locate the pointers to the data records. After finding the relevant pointers, the DBMS retrieves the corresponding records from the data file.
Insertions and Deletions: When a record is inserted or deleted, the index must be updated. Insertion usually requires adding a new key and pointer, while deletion involves removing the key-pointer pair. Depending on the index type (e.g., B-tree or hash), these operations may involve rebalancing or rehashing.
Example of an Index Search:
Suppose we have an index on the EmployeeID attribute in a table of employees:
EmployeeID = 1024, the DBMS will look up the EmployeeID in the index, find the corresponding pointer to the employee’s data record, and fetch the record.Create a Primary Index on a table's primary key:
CREATE INDEX idx_employee_id
ON Employees (EmployeeID);
Create a Secondary Index on a non-primary attribute:
CREATE INDEX idx_employee_name
ON Employees (LastName, FirstName);
Create a Bitmap Index:
CREATE BITMAP INDEX idx_gender
ON Employees (Gender);
Indexed files are a fundamental part of database systems that significantly improve the performance of queries by providing efficient data retrieval paths. They allow databases to handle large amounts of data and still perform complex operations quickly, such as searching, sorting, and range querying. Different types of indexing techniques (e.g., B-trees, hash indexing, bitmap indexes) are chosen based on the characteristics of the data and the types of queries being executed. While indexes provide significant performance benefits, they also come with storage and maintenance costs, so careful consideration is required in their implementation.
Open this section to load past papers