ScholarQuill logoScholarQuillUniversity Notes
  • Notes
  • Past Papers
  • Blogs
  • Todo
Login
ScholarQuill logoScholarQuillUniversity Notes
Login
NotesPast PapersBlogsTodo
More
SubjectsDiscussionCGPA CalculatorGPA CalculatorStudent PortalCourse Outline
About
About usPrivacy PolicyReportContact
Notes
Past Papers
Blogs
Todo
Analytics
    Current Subject
    🧩
    Design and Analysis of Algorithms
    COMP3138
    Progress0 / 53 topics
    Topics
    1. Introduction to Algorithm Design2. Data Structures3. Efficiency in Algorithms4. Analysis of Algorithms5. Mathematical Review6. Mathematical Analysis of Algorithms7. Types of Functions8. Order of Growth9. Asymptotic Notations10. Sorting Algorithms11. Selection Sort Algorithm12. Example and Analysis of Selection Sort13. Insertion Sort Algorithm14. Divide and Conquer Algorithms15. Merge Sort Algorithm16. Quick Sort Algorithm17. Bucket Sort Algorithm18. Radix Sort Algorithm19. Counting Sort Algorithm20. Heap Sort Basics21. Heap Algorithms22. Heap Properties and Examples23. Heap Operations24. Heap Sort Algorithm Analysis25. Heap Insertion and Deletion26. Tree Based Algorithms and Hashing27. Red Black Tree Basics28. Binary Search Tree Basics29. Tree Searching Algorithms30. Analysis of Tree Searching Algorithms31. Analysis of Insertion and Deletion in BST32. Hashing Basics33. Examples of Hash Functions34. Analysis of Collision Resolution Techniques35. Dynamic Programming36. 0-1 Knapsack Problem37. Fractional Knapsack Problem38. Longest Common Subsequence39. Shortest Path Finding40. Matrix Chain Multiplication41. Assembly Line Chain Problem42. Greedy Algorithms43. Prim's Algorithm44. Kruskal's Algorithm45. Dijkstra's Algorithm46. Huffman Coding47. NP-Completeness48. Polynomial Time Verification49. Reducibility50. NP-Completeness Proofs51. Randomized Algorithms52. Particle Swarm Optimization53. Genetic Algorithms
    COMP3138›Huffman Coding
    Design and Analysis of AlgorithmsTopic 46 of 53

    Huffman Coding

    7 minread
    1,219words
    Intermediatelevel

    Huffman Coding

    Huffman Coding is a lossless data compression algorithm used to encode data (usually text or binary data) by assigning variable-length codes to input characters, with the property that the most frequent characters have the shortest codes. This method is widely used in compression algorithms such as ZIP files and JPEG images.

    The idea behind Huffman coding is to reduce the average code length based on the frequency of characters, meaning that more frequent characters are represented with shorter codes and less frequent characters with longer codes. This reduces the overall size of the encoded data.

    How Huffman Coding Works

    1. Frequency Analysis:

      • The algorithm begins by calculating the frequency of each character in the input data.
    2. Building a Huffman Tree:

      • Create a min-heap (priority queue) where each node represents a character and its frequency.
      • The heap is used to always merge the two least frequent nodes first.
      • The algorithm works by combining the two nodes with the lowest frequencies into a new node, with the sum of their frequencies as the frequency of the new node.
      • This new node is then added back to the heap.
      • Repeat the process until there is only one node left in the heap, which becomes the root of the Huffman tree.
    3. Assigning Codes:

      • Traverse the tree from the root to each leaf node.
      • Assign a binary code to each character: assign 0 for the left branch and 1 for the right branch.
      • The binary codes assigned to the characters will have variable lengths, with the most frequent characters having shorter codes.
    4. Encoding:

      • Once the Huffman tree is built, you can encode the data by replacing each character in the input with its corresponding Huffman code.
    5. Decoding:

      • To decode the compressed data, you traverse the Huffman tree based on the sequence of bits and obtain the original characters.

    Example of Huffman Coding

    Consider the following text:

    ABRACADABRA
    
    1. Frequency of Characters:

      A: 5
      B: 2
      R: 2
      C: 1
      D: 1
      
    2. Building the Huffman Tree:

      • Start with a min-heap of nodes representing each character and its frequency:

        [(C, 1), (D, 1), (B, 2), (R, 2), (A, 5)]
        
      • Combine the two nodes with the smallest frequencies:

        Combine C (1) and D (1) into a new node (CD, 2)
        Updated heap: [(B, 2), (R, 2), (CD, 2), (A, 5)]
        
      • Combine the two nodes with the next smallest frequencies:

        Combine B (2) and R (2) into a new node (BR, 4)
        Updated heap: [(CD, 2), (BR, 4), (A, 5)]
        
      • Combine the next two smallest nodes:

        Combine CD (2) and BR (4) into a new node (CDBR, 6)
        Updated heap: [(A, 5), (CDBR, 6)]
        
      • Finally, combine the last two nodes:

        Combine A (5) and CDBR (6) into the root node (ACDBR, 11)
        

      The resulting Huffman tree looks like this:

            (ACDBR, 11)
           /            \
        (A, 5)       (CDBR, 6)
                        /      \
                   (CD, 2)   (BR, 4)
                    /   \     /    \
                 (C, 1) (D, 1) (B, 2) (R, 2)
      
    3. Assigning Huffman Codes:

      • Traverse the tree and assign binary codes:
        • A: 0
        • C: 10
        • D: 11
        • B: 010
        • R: 011
    4. Encoding the Input: Replace each character in "ABRACADABRA" with its corresponding Huffman code:

      A -> 0
      B -> 010
      R -> 011
      A -> 0
      C -> 10
      A -> 0
      D -> 11
      A -> 0
      B -> 010
      R -> 011
      A -> 0
      
      Encoded: 0 010 011 0 10 0 11 0 010 011 0
      

    Huffman Coding Algorithm Steps

    1. Calculate Frequencies: Count the frequency of each character in the input data.

    2. Build the Huffman Tree:

      • Insert all characters into a min-heap, prioritized by frequency.
      • Repeatedly extract the two nodes with the lowest frequencies and combine them into a new node. Add the combined node back to the heap.
      • Repeat until there is only one node remaining, which becomes the root of the tree.
    3. Assign Codes:

      • Traverse the Huffman tree to assign binary codes to each character.
    4. Encode the Data:

      • Replace characters in the input with their Huffman codes.
    5. Decode the Data:

      • Start from the root of the Huffman tree, and for each bit in the encoded data, move left for 0 or right for 1 until you reach a leaf node, which gives the corresponding character.

    Time Complexity

    • Building the Heap: The time complexity of building the min-heap from the frequency table is O(n log n), where n is the number of unique characters.
    • Building the Tree: Each insertion and extraction from the heap takes O(log n) time, and since there are n nodes, the total time complexity for building the Huffman tree is O(n log n).
    • Encoding: For each character, the time complexity is proportional to the length of the Huffman code. This is O(n) for encoding the data.

    Overall, the time complexity of Huffman coding is O(n log n).

    C++ Implementation of Huffman Coding

    Here’s an implementation of Huffman Coding in C++:

    #include <iostream>
    #include <queue>
    #include <vector>
    #include <unordered_map>
    #include <string>
    
    using namespace std;
    
    // Structure for a Huffman tree node
    struct Node {
        char ch;
        int freq;
        Node* left;
        Node* right;
    
        Node(char ch, int freq) : ch(ch), freq(freq), left(nullptr), right(nullptr) {}
    };
    
    // Compare function to be used with priority queue (min-heap)
    struct Compare {
        bool operator()(Node* left, Node* right) {
            return left->freq > right->freq;
        }
    };
    
    // Function to generate Huffman Codes
    void generateHuffmanCodes(Node* root, const string& str, unordered_map<char, string>& huffmanCodes) {
        if (root == nullptr)
            return;
    
        // If it's a leaf node, save the code
        if (!root->left && !root->right) {
            huffmanCodes[root->ch] = str;
        }
    
        generateHuffmanCodes(root->left, str + "0", huffmanCodes);
        generateHuffmanCodes(root->right, str + "1", huffmanCodes);
    }
    
    // Function to build Huffman Tree and generate the codes
    unordered_map<char, string> buildHuffmanTree(const string& text) {
        // Count the frequency of each character
        unordered_map<char, int> freq;
        for (char ch : text) {
            freq[ch]++;
        }
    
        // Create a priority queue to store nodes of the tree
        priority_queue<Node*, vector<Node*>, Compare> minHeap;
    
        // Create a node for each character and push it into the priority queue
        for (auto pair : freq) {
            minHeap.push(new Node(pair.first, pair.second));
        }
    
        // Build the Huffman tree
        while (minHeap.size() > 1) {
            Node* left = minHeap.top();
            minHeap.pop();
            Node* right = minHeap.top();
            minHeap.pop();
    
            Node* newNode = new Node('$', left->freq + right->freq);
            newNode->left = left;
            newNode->right = right;
            minHeap.push(newNode);
        }
    
        // The remaining node is the root of the Huffman tree
        Node* root = minHeap.top();
    
        // Generate Huffman codes from the tree
        unordered_map<char, string> huffmanCodes;
        generateHuffmanCodes(root, "", huffmanCodes);
    
        return huffmanCodes;
    }
    
    // Function to encode the text using the Huffman codes
    string encode(const string& text, const unordered_map<char, string>& huffmanCodes) {
        string encodedText = "";
        for (char ch : text) {
            encodedText += huffmanCodes.at(ch);
        }
        return encodedText;
    }
    
    int main() {
        string text = "ABRACADABRA";
    
        // Build the Huffman tree and get the codes
        unordered_map<char, string> huffmanCodes = buildHuffmanTree(text);
    
        // Print the Huffman codes
        cout <<
    
    Previous topic 45
    Dijkstra's Algorithm
    Next topic 47
    NP-Completeness

    Past Papers

    Open this section to load past papers

    Click on Show Past Papers to see past papers.
    On This Page
      Reading Stats
      Est. reading time7 min
      Word count1,219
      Code examples0
      DifficultyIntermediate