ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Database Systems
    CSI-308
    Progress0 / 22 topics
    Topics
    1. Basic Database Concepts2. Entity Relationship Modelling3. Relational Data Model and Algebra4. Structured Query Language (SQL)5. RDBMS6. Database Design7. Functional Dependencies8. Normal Forms9. Transaction Processing10. Optimization Concepts11. Concurrency Control12. Recovery Techniques13. Database Security and Authorization14. Small Group Project Implementing a Database15. Physical Database Design16. Storage and File Structure17. Indexed Files18. B-Trees19. Files with Dense Index20. Files with Variable Length Records21. Database Efficiency22. Database Tuning
    CSI-308›Storage and File Structure
    Database SystemsTopic 16 of 22

    Storage and File Structure

    8 minread
    1,286words
    Intermediatelevel

    Storage and File Structure in Databases

    In database systems, storage and file structure refer to how data is physically stored on disk and how it is organized for efficient access, retrieval, and modification. The design of these structures plays a critical role in ensuring that the database performs efficiently, scales well, and maintains data integrity.

    In this context, we will cover the following concepts:

    1. Data Storage Basics
    2. File Organization Techniques
    3. Data Access Paths
    4. Data Structures for Efficient Storage
    5. Disk Storage and Buffer Management
    6. Types of File Structures

    1. Data Storage Basics

    When a DBMS stores data, it does so in files that reside on disk. These files are managed by the operating system and the DBMS together. The DBMS abstracts the physical storage of data from users, presenting a logical view through tables, views, and indexes.

    Key Concepts:

    • Storage Unit: The smallest unit of storage is typically called a block or page. This unit typically holds one or more rows of a table or index.
    • Disk: Data is stored on physical disks (HDDs or SSDs) managed by the operating system. The DBMS manages how the data is distributed across multiple disks or file systems.
    • File System: The underlying file system (e.g., NTFS, ext4, or ZFS) is used by the OS to handle files. The DBMS uses this file system to store data in a specific organization.

    2. File Organization Techniques

    File organization refers to how data is stored in files on disk. Efficient file organization is essential for optimal performance, as it impacts how quickly data can be retrieved, inserted, updated, or deleted.

    Types of File Organization:

    • Heap File Organization: This is the simplest type of file organization, where records are inserted randomly in the file without any particular order. It is inefficient for large databases because searching and updating records requires a full table scan.

      • Example: A table of students where each new student record is added at the end of the file.
    • Sequential File Organization: In this organization, records are stored in sorted order based on a key attribute. It allows for efficient range queries but is slow for insertions and deletions since records must be shifted to maintain order.

      • Example: Storing employee records sorted by employee ID.
    • Hashed File Organization: This organization uses a hash function to map a key value to a specific file location. It allows fast retrieval based on the key but is inefficient for range queries, as there is no ordering.

      • Example: A hash table to store user records where the key is the user ID.

    3. Data Access Paths

    A data access path determines how the DBMS locates and retrieves data from storage. Access paths are crucial for query performance because they dictate the efficiency of retrieving records based on specific conditions (like searching by a key or sorting by an attribute).

    Common Data Access Paths:

    • Primary Access Path: This refers to the fastest path to access data, usually based on the primary key. For example, accessing records by their unique identifier (e.g., student ID).

    • Secondary Access Path: This is used for non-primary key attributes, such as searching by last name or by the date of birth in the case of a "Students" table. These access paths are typically implemented using indexes.

    • Clustered Access Path: This refers to when records are physically grouped or "clustered" together based on a certain criterion (e.g., grouping all records of the same department in a company in adjacent blocks). It improves the performance of range queries.


    4. Data Structures for Efficient Storage

    Data structures help organize and store data in a way that facilitates quick access, insertion, updating, and deletion. The choice of data structure depends on the types of queries and the size of the data.

    Key Data Structures:

    • B-trees (Balanced Trees):

      • B-trees are commonly used for indexing in relational databases. They allow fast searching, insertion, and deletion. B-trees are balanced trees, meaning the depth of the tree remains constant, ensuring predictable access times.
      • B-trees store data in a sorted order, which makes range queries (finding all records between two values) efficient.

      Example: An index on the BookID field of the Books table might use a B-tree to allow fast lookup of books by ID.

    • Hash Tables:

      • A hash table is used to map keys to specific values (e.g., mapping employee ID to employee records). A hash function is used to map the key to a disk location, allowing constant-time access on average.
    • Bitmap Indexes:

      • A bitmap index is a compact data structure used to store the existence of records that satisfy a particular condition (e.g., gender = 'Male'). Bitmap indexes are especially useful for columns with low cardinality (few distinct values).
    • Quadtrees / R-trees:

      • These are used for spatial databases, storing multi-dimensional data (e.g., geographic coordinates, image data, or objects in 3D space). They partition the space into regions to allow efficient range queries.

    5. Disk Storage and Buffer Management

    Data is stored on disk, but accessing data from disk is slower than from memory. To optimize disk I/O, the DBMS uses buffer management techniques to minimize the number of disk accesses.

    • Buffer Pool: A buffer pool is an area of memory (RAM) where the DBMS temporarily stores data pages that are frequently accessed. When a query needs data, the DBMS first checks the buffer pool to see if the data is already loaded. If not, the data is fetched from disk and placed in the buffer.

    • Page Replacement: When the buffer pool is full and a new page needs to be loaded, the DBMS must choose an old page to evict. Common algorithms include LRU (Least Recently Used), FIFO (First In, First Out), and Clock.

    • Write-Ahead Log (WAL): This technique ensures that changes made to data are logged before they are written to disk. This provides durability and is a key part of maintaining ACID properties (Atomicity, Consistency, Isolation, Durability).


    6. Types of File Structures

    File structures refer to how data is physically stored and organized in files. The choice of file structure affects the efficiency of accessing, inserting, and deleting data.

    Common File Structures:

    1. Heap Files:

      • Heap file is an unordered collection of records. In a heap file, new records are added at the end of the file, and there is no specific order in which records are stored. This is simple but inefficient for large databases, especially if the data needs to be searched often.
    2. Sequential Files:

      • Records in sequential files are stored in sorted order, typically based on a key. This allows efficient searching for records in a specific range but can make inserts and deletions more expensive because the file must be reorganized to maintain order.
    3. Hashed Files:

      • Hash-based file structures store data in buckets, and a hash function is used to determine the bucket in which a record is stored. This structure allows fast lookups for specific keys but is not efficient for range queries or sorting.
    4. Clustered Files:

      • In a clustered file structure, related records from different tables are stored together in the same physical block. This reduces I/O costs for certain types of queries (e.g., joining tables) by ensuring that related data is stored close together on disk.

    Conclusion

    The storage and file structure of a database plays a crucial role in determining its efficiency and performance. By choosing the right file organization techniques, access paths, data structures, and disk management strategies, a DBMS can optimize data retrieval, insertion, updating, and deletion. Understanding these concepts is key to designing high-performance, scalable databases that meet the needs of various applications.

    Previous topic 15
    Physical Database Design
    Next topic 17
    Indexed Files

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time8 min
      Word count1,286
      Code examples0
      DifficultyIntermediate