In database systems, storage and file structure refer to how data is physically stored on disk and how it is organized for efficient access, retrieval, and modification. The design of these structures plays a critical role in ensuring that the database performs efficiently, scales well, and maintains data integrity.
In this context, we will cover the following concepts:
When a DBMS stores data, it does so in files that reside on disk. These files are managed by the operating system and the DBMS together. The DBMS abstracts the physical storage of data from users, presenting a logical view through tables, views, and indexes.
Key Concepts:
File organization refers to how data is stored in files on disk. Efficient file organization is essential for optimal performance, as it impacts how quickly data can be retrieved, inserted, updated, or deleted.
Types of File Organization:
Heap File Organization: This is the simplest type of file organization, where records are inserted randomly in the file without any particular order. It is inefficient for large databases because searching and updating records requires a full table scan.
Sequential File Organization: In this organization, records are stored in sorted order based on a key attribute. It allows for efficient range queries but is slow for insertions and deletions since records must be shifted to maintain order.
Hashed File Organization: This organization uses a hash function to map a key value to a specific file location. It allows fast retrieval based on the key but is inefficient for range queries, as there is no ordering.
A data access path determines how the DBMS locates and retrieves data from storage. Access paths are crucial for query performance because they dictate the efficiency of retrieving records based on specific conditions (like searching by a key or sorting by an attribute).
Common Data Access Paths:
Primary Access Path: This refers to the fastest path to access data, usually based on the primary key. For example, accessing records by their unique identifier (e.g., student ID).
Secondary Access Path: This is used for non-primary key attributes, such as searching by last name or by the date of birth in the case of a "Students" table. These access paths are typically implemented using indexes.
Clustered Access Path: This refers to when records are physically grouped or "clustered" together based on a certain criterion (e.g., grouping all records of the same department in a company in adjacent blocks). It improves the performance of range queries.
Data structures help organize and store data in a way that facilitates quick access, insertion, updating, and deletion. The choice of data structure depends on the types of queries and the size of the data.
Key Data Structures:
B-trees (Balanced Trees):
Example: An index on the BookID field of the Books table might use a B-tree to allow fast lookup of books by ID.
Hash Tables:
Bitmap Indexes:
gender = 'Male'). Bitmap indexes are especially useful for columns with low cardinality (few distinct values).Quadtrees / R-trees:
Data is stored on disk, but accessing data from disk is slower than from memory. To optimize disk I/O, the DBMS uses buffer management techniques to minimize the number of disk accesses.
Buffer Pool: A buffer pool is an area of memory (RAM) where the DBMS temporarily stores data pages that are frequently accessed. When a query needs data, the DBMS first checks the buffer pool to see if the data is already loaded. If not, the data is fetched from disk and placed in the buffer.
Page Replacement: When the buffer pool is full and a new page needs to be loaded, the DBMS must choose an old page to evict. Common algorithms include LRU (Least Recently Used), FIFO (First In, First Out), and Clock.
Write-Ahead Log (WAL): This technique ensures that changes made to data are logged before they are written to disk. This provides durability and is a key part of maintaining ACID properties (Atomicity, Consistency, Isolation, Durability).
File structures refer to how data is physically stored and organized in files. The choice of file structure affects the efficiency of accessing, inserting, and deleting data.
Common File Structures:
Heap Files:
Sequential Files:
Hashed Files:
Clustered Files:
The storage and file structure of a database plays a crucial role in determining its efficiency and performance. By choosing the right file organization techniques, access paths, data structures, and disk management strategies, a DBMS can optimize data retrieval, insertion, updating, and deletion. Understanding these concepts is key to designing high-performance, scalable databases that meet the needs of various applications.
Open this section to load past papers