A hash function is an algorithm that takes an input (or key) and produces a fixed-size string of bytes, typically a hash code or hash value. The main goal of a hash function is to efficiently map data (keys) to indices in a hash table and ensure that the keys are uniformly distributed across the available buckets to minimize collisions.
Here are some simple and commonly used hash functions with explanations and examples.
One of the simplest hash functions is the division method. It takes the key, divides it by a prime number (usually the size of the hash table), and returns the remainder. The remainder is the hash code, and it determines the index of the key in the hash table.
hash(key) = key % table_size
keytable_size - 1.Let's say we have a hash table with 10 slots (i.e., table_size = 10) and we want to insert keys: 25, 34, 18, 5, 20.
key = 25:
hash(25) = 25 % 10 = 5
So, the key 25 will be placed in index 5.
key = 34:
hash(34) = 34 % 10 = 4
So, the key 34 will be placed in index 4.
key = 18:
hash(18) = 18 % 10 = 8
So, the key 18 will be placed in index 8.
key = 5:
hash(5) = 5 % 10 = 5
So, the key 5 will be placed in index 5, causing a collision with key 25.
key = 20:
hash(20) = 20 % 10 = 0
So, the key 20 will be placed in index 0.
The keys are hashed into the following indices:
Index 0: 20
Index 4: 34
Index 5: 25, 5 (collision)
Index 8: 18
This method is simple but prone to clustering and collisions, especially if the table size is not chosen carefully.
In this method, instead of using the modulo operator, we multiply the key by a constant and then take the fractional part of the result. The fractional part is multiplied by the table size to obtain the final hash code.
hash(key) = floor(table_size * (key * A % 1))
Where:
key is the input value.A is a constant between 0 and 1 (often chosen as the fractional part of the golden ratio A ≈ 0.6180339887).table_size is the size of the hash table.Let’s say the table size is 10 (table_size = 10) and we want to insert keys 25, 34, 18, 5, 20, and we use A = 0.6180339887.
For each key, we compute the hash value:
key = 25:
hash(25) = floor(10 * (25 * 0.6180339887 % 1)) = floor(10 * 0.450849829) = 4
So, key 25 will go to index 4.
key = 34:
hash(34) = floor(10 * (34 * 0.6180339887 % 1)) = floor(10 * 0.993379182) = 9
So, key 34 will go to index 9.
key = 18:
hash(18) = floor(10 * (18 * 0.6180339887 % 1)) = floor(10 * 0.126606027) = 1
So, key 18 will go to index 1.
key = 5:
hash(5) = floor(10 * (5 * 0.6180339887 % 1)) = floor(10 * 0.090166975) = 0
So, key 5 will go to index 0.
key = 20:
hash(20) = floor(10 * (20 * 0.6180339887 % 1)) = floor(10 * 0.360679774) = 3
So, key 20 will go to index 3.
The keys are hashed into the following indices:
Index 0: 5
Index 1: 18
Index 3: 20
Index 4: 25
Index 9: 34
This method tends to distribute values more uniformly across the hash table compared to the division method, especially when A is chosen properly.
For hashing strings, a common method is the Polynomial Rolling Hash, which converts each character in the string to an integer (typically using ASCII values) and calculates the hash as a polynomial of the characters.
hash(s) = (s[0] * p^0 + s[1] * p^1 + ... + s[n-1] * p^(n-1)) % m
Where:
s[i] is the ASCII value of the i-th character of the string.p is a constant base (typically a small prime, like 31).m is a large prime number to reduce collisions (modulus operation).Let’s say we want to hash the string "abc" using p = 31 and m = 101.
'a' = 97, 'b' = 98, 'c' = 99"abc", the hash would be:
hash("abc") = (97 * 31^0 + 98 * 31^1 + 99 * 31^2) % 101
= (97 + 98 * 31 + 99 * 961) % 101
= (97 + 3038 + 95139) % 101
= 98274 % 101
= 14
So, the hash value for the string "abc" would be 14.
The string "abc" would map to index 14 in a hash table of size m = 101.
The DJB2 hash function is a simple but effective hash function created by Daniel J. Bernstein. It is often used for strings and is very fast to compute.
hash(key) = 5381
for each character c in the string:
hash = ((hash << 5) + hash) + c
return hash
Here, hash << 5 is the same as multiplying the hash by 32, and the formula hash = ((hash << 5) + hash) + c is a quick way to combine the previous hash with the current character.
Let’s compute the hash for the string "abc".
Initial hash value: hash = 5381
For 'a' = 97:
hash = ((5381 << 5) + 5381) + 97 = (5381 * 32) + 5381 + 97 = 177,777 + 5381 + 97 = 183,255
For 'b' = 98:
hash = ((183,255 << 5) + 183,255) + 98 = (183,255 * 32) + 183,255 + 98 = 5,866,160 + 183,255 + 98 = 6,049,513
For 'c' = 99:
hash = ((6,049,513 << 5) + 6,049,513) + 99 = (6,049,513 * 32) + 6,049,513 + 99 = 193,584,416 + 6,049,513 + 99 = 199,633,028
Thus, the final hash for "abc" is 199,633,028.
The string "abc" maps to a very large hash value. To map this hash to a table of size m, we would use:
hash("abc") % m
Cryptographic hash functions are designed to be secure and to avoid collisions.
Open this section to load past papers