Huffman Coding

Lossless compression by assigning shorter codes to frequent characters.

How it works

Huffman coding is a lossless data compression algorithm that assigns shorter binary codes to more frequent characters and longer codes to rarer ones. It works by building a binary prefix tree (trie) bottom-up: start with a min-heap of leaf nodes — one per unique character, keyed by frequency — then repeatedly extract the two lowest-frequency nodes and merge them into an internal node whose frequency is their sum. When only one node remains, it is the root. Each left edge is labeled 0 and each right edge 1; the code for a character is the path from the root to its leaf.

The key insight is greedy optimality: at each step, merging the two lowest-frequency nodes minimises the expected code length over the whole tree. This can be proven via an exchange argument — any tree that swaps the placement of the two minimum-frequency characters with higher-frequency ones produces a worse or equal total cost. The resulting code is prefix-free: no code is a prefix of another, so it can be decoded unambiguously without separator characters. Time complexity is O(n log n) dominated by the heap operations; in practice, the number of unique characters n is small (≤ 256 for ASCII), making the algorithm extremely fast.

Huffman coding is the final stage in many real-world compression pipelines. Formats like zlib/gzip first apply LZ77 (dictionary compression) to remove repeated substrings, then pass the resulting symbol stream through Huffman coding to squeeze out statistical redundancy. JPEG uses Huffman entropy coding on the quantised DCT coefficients, and MP3 uses it on the Huffman-coded Huffman indices. Understanding Huffman coding unlocks the entropy-coding layer present in nearly every modern compression and media format.

Complexity

Best case

O(n log n)

Average case

O(n log n)

Worst case

O(n log n)

Space

O(n)

Visualizer

Generating steps…

Where it's used in the real world

zlib / gzip / DEFLATE

Entropy coding stage of general-purpose file and HTTP compression

DEFLATE (used in .gz, .zip, .png, and HTTP Content-Encoding: gzip) chains LZ77 back-reference compression with Huffman coding. The Huffman stage can encode the most common literals in 1–3 bits, achieving 2–5× compression on typical text and 20–40% on already-compact binary formats.

JPEG / PNG image formats

Entropy coding of quantised DCT coefficients (JPEG) and filtered scanlines (PNG)

JPEG uses Huffman coding after quantisation to pack the non-zero DCT coefficients. PNG applies DEFLATE (Huffman + LZ77) to filtered scanline data. In both cases, Huffman coding exploits the skewed frequency distribution of post-transform symbols to approach the Shannon entropy limit.

MP3 / AAC audio compression

Huffman entropy coding of quantised audio spectral coefficients

After psychoacoustic modelling and MDCT transform, MP3 quantises spectral lines and feeds them through a Huffman coder with a fixed codebook of 32 tables chosen per granule. AAC extends this with arithmetic coding but retains Huffman tables for certain bands. The entropy stage typically contributes 10–25% of the total bitrate savings.

Implementation

interface HuffNode {
  char: string | null;
  freq: number;
  left: HuffNode | null;
  right: HuffNode | null;
}

function buildHuffmanTree(text: string): Map<string, string> {
  const freq = new Map<string, number>();
  for (const c of text) freq.set(c, (freq.get(c) ?? 0) + 1);

  // Min-heap via sorted array (fine for small alphabets)
  let heap: HuffNode[] = [...freq.entries()].map(([char, f]) => ({
    char, freq: f, left: null, right: null,
  }));
  heap.sort((a, b) => a.freq - b.freq);

  while (heap.length > 1) {
    const a = heap.shift()!;
    const b = heap.shift()!;
    const merged: HuffNode = { char: null, freq: a.freq + b.freq, left: a, right: b };
    // Insert in sorted position
    let i = heap.findIndex(n => n.freq > merged.freq);
    if (i === -1) i = heap.length;
    heap.splice(i, 0, merged);
  }

  const codes = new Map<string, string>();
  function walk(node: HuffNode | null, prefix: string) {
    if (!node) return;
    if (node.char !== null) { codes.set(node.char, prefix || "0"); return; }
    walk(node.left,  prefix + "0");
    walk(node.right, prefix + "1");
  }
  walk(heap[0], "");
  return codes;
}

const codes = buildHuffmanTree("ABRACADABRA");
for (const [ch, code] of [...codes.entries()].sort()) {
  console.log(`${ch}: ${code}`);
}
// A: 0, B: 10, R: 110, C: 1110, D: 1111