Understanding the BM25 Formula¶

BM25 (Best Matching 25) is one of the most popular ranking functions used in search engines. It determines how relevant a document is to a search query. Let's break down this intimidating formula into understandable pieces.

The Complete Formula¶

BM25(D, Q) = Σ IDF(qi) × (f(qi, D) × (k1 + 1)) / (f(qi, D) + k1 × (1 - b + b × |D| / avgdl))

Don't panic! We'll explain each part step by step.

What Do These Symbols Mean?¶

D = The document we're scoring
Q = The query (search terms)
qi = Each individual term/word in the query
Σ = Sum (we calculate this for each query term and add them up)
f(qi, D) = Frequency of term qi in document D (how many times the word appears)
|D| = Length of document D (number of words)
avgdl = Average document length across all documents
k1 and b = Tuning parameters (more on these later)

Breaking It Down: Three Key Components¶

Component 1: IDF (Inverse Document Frequency)¶

IDF(qi)

What it does: Measures how rare or common a word is across all documents.

Why it matters: Rare words are more informative. If you search for "php tutorial" the word "php" is more valuable than "the" or "a".

How it works: - Words appearing in few documents get high IDF scores (more valuable) - Words appearing in many documents get low IDF scores (less valuable)

Simple formula:

IDF(qi) = log((N - n(qi) + 0.5) / (n(qi) + 0.5))

Where N = total documents, n(qi) = documents containing the term

Component 2: Term Frequency with Saturation¶

(f(qi, D) × (k1 + 1)) / (f(qi, D) + k1)

What it does: Measures how often a term appears in the document, with diminishing returns.

Why it matters: A word appearing 5 times is more relevant than appearing once. But appearing 100 times isn't 100× more relevant than once—there's a saturation point.

The k1 parameter (typically 1.2 to 2.0): - Higher k1 = term frequency matters more - Lower k1 = faster saturation (repetition matters less)

Example with k1 = 1.2: - Term appears 1 time: score ≈ 0.55 - Term appears 2 times: score ≈ 0.73 - Term appears 5 times: score ≈ 0.89 - Term appears 100 times: score ≈ 0.99

Notice how the score increases but never exceeds 1.0, and improvements get smaller.

Component 3: Document Length Normalization¶

1 - b + b × (|D| / avgdl)

What it does: Adjusts scores based on document length.

Why it matters: Longer documents naturally contain more words, so they might match query terms more often just by being longer, not by being more relevant.

The b parameter (typically 0.75): - b = 0: Document length doesn't matter at all - b = 1: Full length normalization (strongly penalizes long documents) - b = 0.75: Balanced approach (standard)

How it works: - If |D| = avgdl (document is average length): factor = 1 (no adjustment) - If |D| > avgdl (document is longer): factor > 1 (slight penalty) - If |D| < avgdl (document is shorter): factor < 1 (slight boost)

Putting It All Together¶

Let's walk through a complete example.

Search Query: "php tutorial"

Document: "This php tutorial covers php basics. PHP is a popular programming language for beginners learning php."

Step 1: Calculate for "php"¶

Assume: - f("php", D) = 4 (appears 4 times) - IDF("php") = 2.5 (moderately rare) - |D| = 18 words - avgdl = 20 words - k1 = 1.2 - b = 0.75

Term frequency component:

(4 × (1.2 + 1)) / (4 + 1.2) = (4 × 2.2) / 5.2 = 8.8 / 5.2 ≈ 1.69

Length normalization:

1 - 0.75 + 0.75 × (18 / 20) = 0.25 + 0.675 = 0.925

Combined for "php":

2.5 × 1.69 / 0.925 ≈ 4.57

Step 2: Calculate for "tutorial"¶

Assume: - f("tutorial", D) = 1 - IDF("tutorial") = 3.0 (rarer)

Term frequency component:

(1 × 2.2) / (1 + 1.2) = 2.2 / 2.2 = 1.0

Combined for "tutorial":

3.0 × 1.0 / 0.925 ≈ 3.24

Step 3: Sum Everything¶

BM25 score = 4.57 + 3.24 = 7.81

This document would score 7.81 for the query "php tutorial".

What Makes BM25 Smart?¶

Handles repetition intelligently: Saying "php" 100 times doesn't make your document 100× more relevant
Values rare words: Technical terms or specific concepts matter more than common words
Fair to all document lengths: Short, focused documents can compete with comprehensive longer ones
Query-specific: Each query term contributes independently, then they're summed

Tuning Parameters¶

k1 (Term Frequency Saturation)¶

Default: 1.2
Lower (1.0): Good for precise queries where repetition matters less
Higher (2.0): Good when term frequency is highly indicative of relevance

b (Length Normalization)¶

Default: 0.75
Lower (0.5): Use when document length varies widely but all are relevant
Higher (0.9): Use when you want to penalize verbose documents more

Common Student Questions¶

Q: Why not just count word matches?

A: Simple counting favors long documents and treats all words equally. BM25 is smarter about both issues.

Q: Can BM25 be negative?

A: Technically yes, if IDF is negative (when a term appears in most documents), but modern implementations often use a floor of 0.

Q: Does word order matter?

A: No, BM25 is a "bag of words" model. "php tutorial" and "tutorial php" score the same.

Q: How does this compare to TF-IDF?

A: BM25 is an improvement over TF-IDF. It adds saturation (diminishing returns) and better length normalization.

Key Takeaways¶

BM25 combines term importance (IDF), term frequency (with saturation), and document length (normalization)
It's more sophisticated than simple word counting
The formula has stood the test of time because it balances multiple factors well
Understanding each component helps you tune it for your specific use case

BM25 remains the foundation of many modern search systems because it captures fundamental principles of relevance in a mathematically sound way.