Skip to content

Understanding the BM25 Formula

BM25 (Best Matching 25) is one of the most popular ranking functions used in search engines. It determines how relevant a document is to a search query. Let's break down this intimidating formula into understandable pieces.

The Complete Formula

BM25(D, Q) = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D| / avgdl))

Don't panic! We'll explain each part step by step.

What Do These Symbols Mean?

  • D = The document we're scoring
  • Q = The query (search terms)
  • qi = Each individual term/word in the query
  • Σ = Sum (we calculate this for each query term and add them up)
  • f(qi, D) = Frequency of term qi in document D (how many times the word appears)
  • |D| = Length of document D (number of words)
  • avgdl = Average document length across all documents
  • k1 and b = Tuning parameters (more on these later)

Breaking It Down: Three Key Components

Component 1: IDF (Inverse Document Frequency)

IDF(qi)

What it does: Measures how rare or common a word is across all documents.

Why it matters: Rare words are more informative. If you search for "php tutorial" the word "php" is more valuable than "the" or "a".

How it works: - Words appearing in few documents get high IDF scores (more valuable) - Words appearing in many documents get low IDF scores (less valuable)

Simple formula:

IDF(qi) = log((N - n(qi) + 0.5) / (n(qi) + 0.5))
Where N = total documents, n(qi) = documents containing the term

Component 2: Term Frequency with Saturation

(f(qi, D) × (k1 + 1)) / (f(qi, D) + k1)

What it does: Measures how often a term appears in the document, with diminishing returns.

Why it matters: A word appearing 5 times is more relevant than appearing once. But appearing 100 times isn't 100× more relevant than once—there's a saturation point.

The k1 parameter (typically 1.2 to 2.0): - Higher k1 = term frequency matters more - Lower k1 = faster saturation (repetition matters less)

Example with k1 = 1.2: - Term appears 1 time: score ≈ 0.55 - Term appears 2 times: score ≈ 0.73 - Term appears 5 times: score ≈ 0.89 - Term appears 100 times: score ≈ 0.99

Notice how the score increases but never exceeds 1.0, and improvements get smaller.

Component 3: Document Length Normalization

1 - b + b × (|D| / avgdl)

What it does: Adjusts scores based on document length.

Why it matters: Longer documents naturally contain more words, so they might match query terms more often just by being longer, not by being more relevant.

The b parameter (typically 0.75): - b = 0: Document length doesn't matter at all - b = 1: Full length normalization (strongly penalizes long documents) - b = 0.75: Balanced approach (standard)

How it works: - If |D| = avgdl (document is average length): factor = 1 (no adjustment) - If |D| > avgdl (document is longer): factor > 1 (slight penalty) - If |D| < avgdl (document is shorter): factor < 1 (slight boost)

Putting It All Together

Let's walk through a complete example.

Search Query: "php tutorial"

Document: "This php tutorial covers php basics. PHP is a popular programming language for beginners learning php."

Step 1: Calculate for "php"

Assume: - f("php", D) = 4 (appears 4 times) - IDF("php") = 2.5 (moderately rare) - |D| = 18 words - avgdl = 20 words - k1 = 1.2 - b = 0.75

Term frequency component:

(4 × (1.2 + 1)) / (4 + 1.2) = (4 × 2.2) / 5.2 = 8.8 / 5.2 ≈ 1.69

Length normalization:

1 - 0.75 + 0.75 × (18 / 20) = 0.25 + 0.675 = 0.925

Combined for "php":

2.5 × 1.69 / 0.925 ≈ 4.57

Step 2: Calculate for "tutorial"

Assume: - f("tutorial", D) = 1 - IDF("tutorial") = 3.0 (rarer)

Term frequency component:

(1 × 2.2) / (1 + 1.2) = 2.2 / 2.2 = 1.0

Combined for "tutorial":

3.0 × 1.0 / 0.925 ≈ 3.24

Step 3: Sum Everything

BM25 score = 4.57 + 3.24 = 7.81

This document would score 7.81 for the query "php tutorial".

What Makes BM25 Smart?

  1. Handles repetition intelligently: Saying "php" 100 times doesn't make your document 100× more relevant

  2. Values rare words: Technical terms or specific concepts matter more than common words

  3. Fair to all document lengths: Short, focused documents can compete with comprehensive longer ones

  4. Query-specific: Each query term contributes independently, then they're summed

Tuning Parameters

k1 (Term Frequency Saturation)

  • Default: 1.2
  • Lower (1.0): Good for precise queries where repetition matters less
  • Higher (2.0): Good when term frequency is highly indicative of relevance

b (Length Normalization)

  • Default: 0.75
  • Lower (0.5): Use when document length varies widely but all are relevant
  • Higher (0.9): Use when you want to penalize verbose documents more

Common Student Questions

Q: Why not just count word matches?

A: Simple counting favors long documents and treats all words equally. BM25 is smarter about both issues.

Q: Can BM25 be negative?

A: Technically yes, if IDF is negative (when a term appears in most documents), but modern implementations often use a floor of 0.

Q: Does word order matter?

A: No, BM25 is a "bag of words" model. "php tutorial" and "tutorial php" score the same.

Q: How does this compare to TF-IDF?

A: BM25 is an improvement over TF-IDF. It adds saturation (diminishing returns) and better length normalization.

Key Takeaways

  • BM25 combines term importance (IDF), term frequency (with saturation), and document length (normalization)
  • It's more sophisticated than simple word counting
  • The formula has stood the test of time because it balances multiple factors well
  • Understanding each component helps you tune it for your specific use case

BM25 remains the foundation of many modern search systems because it captures fundamental principles of relevance in a mathematically sound way.