Huffman Coding is a lossless data compression algorithm used to encode data (usually text or binary data) by assigning variable-length codes to input characters, with the property that the most frequent characters have the shortest codes. This method is widely used in compression algorithms such as ZIP files and JPEG images.
The idea behind Huffman coding is to reduce the average code length based on the frequency of characters, meaning that more frequent characters are represented with shorter codes and less frequent characters with longer codes. This reduces the overall size of the encoded data.
Frequency Analysis:
Building a Huffman Tree:
Assigning Codes:
0 for the left branch and 1 for the right branch.Encoding:
Decoding:
Consider the following text:
ABRACADABRA
Frequency of Characters:
A: 5
B: 2
R: 2
C: 1
D: 1
Building the Huffman Tree:
Start with a min-heap of nodes representing each character and its frequency:
[(C, 1), (D, 1), (B, 2), (R, 2), (A, 5)]
Combine the two nodes with the smallest frequencies:
Combine C (1) and D (1) into a new node (CD, 2)
Updated heap: [(B, 2), (R, 2), (CD, 2), (A, 5)]
Combine the two nodes with the next smallest frequencies:
Combine B (2) and R (2) into a new node (BR, 4)
Updated heap: [(CD, 2), (BR, 4), (A, 5)]
Combine the next two smallest nodes:
Combine CD (2) and BR (4) into a new node (CDBR, 6)
Updated heap: [(A, 5), (CDBR, 6)]
Finally, combine the last two nodes:
Combine A (5) and CDBR (6) into the root node (ACDBR, 11)
The resulting Huffman tree looks like this:
(ACDBR, 11)
/ \
(A, 5) (CDBR, 6)
/ \
(CD, 2) (BR, 4)
/ \ / \
(C, 1) (D, 1) (B, 2) (R, 2)
Assigning Huffman Codes:
A: 0C: 10D: 11B: 010R: 011Encoding the Input: Replace each character in "ABRACADABRA" with its corresponding Huffman code:
A -> 0
B -> 010
R -> 011
A -> 0
C -> 10
A -> 0
D -> 11
A -> 0
B -> 010
R -> 011
A -> 0
Encoded: 0 010 011 0 10 0 11 0 010 011 0
Calculate Frequencies: Count the frequency of each character in the input data.
Build the Huffman Tree:
Assign Codes:
Encode the Data:
Decode the Data:
0 or right for 1 until you reach a leaf node, which gives the corresponding character.Overall, the time complexity of Huffman coding is O(n log n).
Here’s an implementation of Huffman Coding in C++:
#include <iostream>
#include <queue>
#include <vector>
#include <unordered_map>
#include <string>
using namespace std;
// Structure for a Huffman tree node
struct Node {
char ch;
int freq;
Node* left;
Node* right;
Node(char ch, int freq) : ch(ch), freq(freq), left(nullptr), right(nullptr) {}
};
// Compare function to be used with priority queue (min-heap)
struct Compare {
bool operator()(Node* left, Node* right) {
return left->freq > right->freq;
}
};
// Function to generate Huffman Codes
void generateHuffmanCodes(Node* root, const string& str, unordered_map<char, string>& huffmanCodes) {
if (root == nullptr)
return;
// If it's a leaf node, save the code
if (!root->left && !root->right) {
huffmanCodes[root->ch] = str;
}
generateHuffmanCodes(root->left, str + "0", huffmanCodes);
generateHuffmanCodes(root->right, str + "1", huffmanCodes);
}
// Function to build Huffman Tree and generate the codes
unordered_map<char, string> buildHuffmanTree(const string& text) {
// Count the frequency of each character
unordered_map<char, int> freq;
for (char ch : text) {
freq[ch]++;
}
// Create a priority queue to store nodes of the tree
priority_queue<Node*, vector<Node*>, Compare> minHeap;
// Create a node for each character and push it into the priority queue
for (auto pair : freq) {
minHeap.push(new Node(pair.first, pair.second));
}
// Build the Huffman tree
while (minHeap.size() > 1) {
Node* left = minHeap.top();
minHeap.pop();
Node* right = minHeap.top();
minHeap.pop();
Node* newNode = new Node('$', left->freq + right->freq);
newNode->left = left;
newNode->right = right;
minHeap.push(newNode);
}
// The remaining node is the root of the Huffman tree
Node* root = minHeap.top();
// Generate Huffman codes from the tree
unordered_map<char, string> huffmanCodes;
generateHuffmanCodes(root, "", huffmanCodes);
return huffmanCodes;
}
// Function to encode the text using the Huffman codes
string encode(const string& text, const unordered_map<char, string>& huffmanCodes) {
string encodedText = "";
for (char ch : text) {
encodedText += huffmanCodes.at(ch);
}
return encodedText;
}
int main() {
string text = "ABRACADABRA";
// Build the Huffman tree and get the codes
unordered_map<char, string> huffmanCodes = buildHuffmanTree(text);
// Print the Huffman codes
cout <<
Open this section to load past papers