Hashing is a powerful technique used to store and retrieve data efficiently, and it forms the foundation of hash tables and hash maps. The idea is to map data to a fixed-size value (called a hash code) using a hash function. This hash value allows for fast data access, providing constant time complexity for many operations like search, insertion, and deletion.
Here’s an overview of the key concepts of hashing, including the components involved and the common methods used.
A hash function is an algorithm that takes an input (often called the key) and maps it to a fixed-size value, usually an integer, called a hash code. The goal of a hash function is to produce a uniform distribution of hash values for different inputs, minimizing collisions (when two keys produce the same hash value).
For an integer key, a simple hash function might be the modulo operation:
hash(key) = key % table_size
For a string key, a more sophisticated approach is often used, such as:
hash(key) = sum of ASCII values of characters % table_size
This is a very basic example. In practice, hash functions are more complex to avoid collisions and ensure efficient distribution.
A hash table (or hash map) is a data structure that uses a hash function to map keys to values. The key is hashed to an index in an array (called a bucket or slot), and the value is stored at that index.
Suppose you want to store the following pairs in a hash table:
Let's assume a hash table with 5 slots and a hash function that returns the length of the key modulo 5. The hash function would look like this:
hash(key) = len(key) % 5
For each key:
In this case, "banana" and "cherry" will both collide and be placed in the same slot (bucket 1).
A collision occurs when two keys produce the same hash value and thus are mapped to the same bucket. Collisions are inevitable because the number of possible keys is generally much larger than the number of available buckets.
Chaining (Separate Chaining):
Example:
bucket[1] → ("banana", 10) → ("cherry", 15)
Open Addressing:
Example (Linear Probing):
The load factor of a hash table is a measure of how full the table is. It is defined as:
load factor = (number of elements in the table) / (size of the table)
A high load factor (i.e., a lot of keys in relatively few buckets) increases the likelihood of collisions, which can degrade performance.
When the load factor exceeds a certain threshold (often around 0.7 or 70%), the hash table may need to be resized (doubled in size, for example) to maintain efficient operations. Resizing involves:
Resizing and rehashing ensure that the performance of the hash table remains efficient, with operations staying close to O(1) on average.
Search: O(1) on average, assuming a good hash function and low collision rates. In the worst case (with many collisions), the time complexity can degrade to O(n) (e.g., if using chaining and all keys hash to the same bucket).
Insert: O(1) on average, as inserting into the correct bucket is typically a constant-time operation. However, when resizing occurs, the insertion time may temporarily increase due to rehashing.
Delete: O(1) on average, similar to search. Deleting an element requires locating it using the hash value and then removing it.
Resize/rehash: O(n), where n is the number of elements in the table. Rehashing requires moving all existing elements to the new table.
dict in Python, Map in Java) for fast lookups.Hashing is a powerful technique for quickly storing and retrieving data. By using a hash function to map keys to indices in a hash table, it offers constant-time average performance for common operations like insertion, deletion, and search. The performance depends on factors like the hash function, collision resolution strategy, and the load factor, but when used appropriately, hashing provides a highly efficient data storage solution.
Open this section to load past papers