{Agorithm}
Algorithms/Rabin-Karp
Stringintermediatepattern-matchingrolling-hashhashing

Rabin-Karp

Use a rolling hash to find pattern in text in expected O(n+m).

How it works

Rabin-Karp is a string-search algorithm that uses *hashing* to find pattern matches. Instead of comparing the pattern character-by-character against every window of the text, it computes a hash of the pattern and a rolling hash of each text window; only when hashes agree does it fall back to a full character verification. The rolling hash update is computed in O(1) by removing the contribution of the outgoing character and adding the incoming one, making each window slide constant time.

The algorithm runs in O(n + m) average and best case, but degrades to O(nm) in the worst case if hash collisions are frequent. In practice, a good hash function (polynomial with a large prime modulus) keeps collisions extremely rare. Rabin-Karp's greatest strength is that it naturally extends to *multi-pattern search*: compute hashes for all patterns and check each window against the set in O(1) using a hash table, enabling simultaneous search for thousands of patterns in one linear pass.

Complexity

Best case
O(n+m)
Average case
O(n+m)
Worst case
O(nm)
Space
O(1)

Visualizer

Generating steps…

Where it's used in the real world

Plagiarism detection

Multi-pattern document fingerprinting

Systems like MOSS and Turnitin hash overlapping k-gram windows of a submission and match them against a corpus. Rabin-Karp's ability to check many pattern hashes per window makes it the natural fit for this class of problem.

Git

Content-defined chunking for delta compression

Git's pack-file format uses rolling Rabin-Karp-style hashes to split file content into variable-size chunks at content-defined boundaries, enabling efficient delta computation between similar objects.

Database engines

Query fingerprinting

PostgreSQL's pg_stat_statements and similar tools hash normalized query strings using rolling-hash techniques to group structurally identical queries, avoiding full string comparison for every incoming query.

Implementation

const BASE = 31;
const MOD = 1_000_003;

function charVal(c: string): number {
  return c.charCodeAt(0) - "A".charCodeAt(0) + 1;
}

function rabinKarp(text: string, pattern: string): number[] {
  const n = text.length;
  const m = pattern.length;
  const matches: number[] = [];

  // Precompute BASE^(m-1) mod MOD
  let basePow = 1;
  for (let k = 0; k < m - 1; k++) basePow = (basePow * BASE) % MOD;

  // Compute pattern hash and initial window hash
  let patternHash = 0;
  let windowHash = 0;
  for (let k = 0; k < m; k++) {
    patternHash = (patternHash * BASE + charVal(pattern[k])) % MOD;
    windowHash  = (windowHash  * BASE + charVal(text[k]))    % MOD;
  }

  for (let i = 0; i <= n - m; i++) {
    if (windowHash === patternHash) {
      // Verify to handle hash collisions
      if (text.slice(i, i + m) === pattern) matches.push(i);
    }
    if (i < n - m) {
      windowHash = ((windowHash - charVal(text[i]) * basePow % MOD + MOD) * BASE
                    + charVal(text[i + m])) % MOD;
    }
  }
  return matches;
}

// Usage
console.log(rabinKarp("ABCABCABC", "CAB")); // [2, 5]

Related algorithms