Hashing and Indexing

4 minread

647words

Beginnerlevel

Hashing and Indexing

Hashing and indexing are crucial concepts in data management that help improve the efficiency of data retrieval and storage.

Hashing

Definition:
Hashing is a technique used to convert input data of arbitrary size into a fixed-size string of characters, typically for fast data retrieval. It uses a hash function to map data to a specific location (or bucket) in a hash table.

Key Components:

Hash Function: A function that takes an input and produces a fixed-size hash value. Good hash functions minimize collisions and distribute data evenly.
Hash Table: An array or a list where the hash values are stored. Each index corresponds to a unique hash value.

Common Operations:

Insertion: Compute the hash value for the data and place it in the corresponding index.
Searching: Compute the hash value for the query and check the corresponding index.
Deletion: Compute the hash value for the data and remove it from the corresponding index.

Collision Resolution: Collisions occur when two inputs hash to the same index. Common strategies to handle collisions include:

Chaining: Store multiple elements at the same index using a linked list.
Open Addressing: Find another empty index within the hash table using techniques like linear probing, quadratic probing, or double hashing.

Example Implementation: Here’s a simple hash table implementation in C++ using separate chaining:

#include <iostream>
#include <list>
#include <vector>
using namespace std;

class HashTable {
private:
    vector<list<int>> table;
    int size;

public:
    HashTable(int s) : size(s) {
        table.resize(size);
    }

    // Hash function
    int hash(int key) {
        return key % size;
    }

    // Insert an element
    void insert(int key) {
        int index = hash(key);
        table[index].push_back(key);
    }

    // Search for an element
    bool search(int key) {
        int index = hash(key);
        for (int x : table[index]) {
            if (x == key) {
                return true;
            }
        }
        return false;
    }

    // Remove an element
    void remove(int key) {
        int index = hash(key);
        table[index].remove(key);
    }
};

int main() {
    HashTable ht(10);
    ht.insert(5);
    ht.insert(15);
    
    cout << "Search 5: " << ht.search(5) << endl; // Outputs: 1 (true)
    cout << "Search 10: " << ht.search(10) << endl; // Outputs: 0 (false)

    ht.remove(5);
    cout << "Search 5 after removal: " << ht.search(5) << endl; // Outputs: 0 (false)

    return 0;
}

Indexing

Definition:
Indexing is a data structure technique used to quickly locate and access the data in a database or data structure. An index is a separate data structure that maintains a reference to the actual data and allows for faster searches.

Key Features:

Types of Indexes:
- Primary Index: Built on the primary key of a table.
- Secondary Index: Additional indexes to improve query performance on non-primary key columns.
- Unique Index: Ensures that no duplicate values are present in the indexed column.
B-Trees and B+ Trees: Common data structures used for indexing in databases, providing balanced search times and efficient insertions/deletions.

Benefits of Indexing:

Faster Search Operations: By avoiding full table scans, indexing speeds up query execution.
Efficient Sorting: Indexes can also facilitate quick sorting of records.

Example Use Case: When searching for a record in a database, using an index can significantly reduce the number of records that need to be examined. Instead of checking each record one by one, the index allows direct access to the records of interest.

Summary

Hashing and indexing are powerful techniques for optimizing data retrieval. Hashing provides an efficient way to store and access data with constant time complexity on average, while indexing enhances search performance in databases. Understanding these concepts is crucial for developing efficient applications and managing large datasets effectively.

Previous topic 23

Binary Search for Sorted Arrays

Next topic 25

Open Addressing and Chaining

Past Papers

Open this section to load past papers

Click on Show Past Papers to see past papers.

#include <iostream> #include <list> #include <vector> using namespace std; class HashTable { private: vector<list<int>> table; int size; public: HashTable(int s) : size(s) { table.resize(size); } // Hash function int hash(int key) { return key % size; } // Insert an element void insert(int key) { int index = hash(key); table[index].push_back(key); } // Search for an element bool search(int key) { int index = hash(key); for (int x : table[index]) { if (x == key) { return true; } } return false; } // Remove an element void remove(int key) { int index = hash(key); table[index].remove(key); } }; int main() { HashTable ht(10); ht.insert(5); ht.insert(15); cout << "Search 5: " << ht.search(5) << endl; // Outputs: 1 (true) cout << "Search 10: " << ht.search(10) << endl; // Outputs: 0 (false) ht.remove(5); cout << "Search 5 after removal: " << ht.search(5) << endl; // Outputs: 0 (false) return 0; }