Skip to content

Vector Search & Distance Metrics:

A Complete Guide for WordPress and MySQL Developers

Modern search is no longer about matching exact words — it’s about understanding meaning. Vector search brings semantic intelligence to WordPress, MySQL, and any custom search pipeline by representing text as vectors (lists of numbers) and comparing them using distance metrics.

This document gives you a complete, developer‑friendly walkthrough of:

  • What vector search is
  • How embeddings work
  • How similarity is calculated
  • How different distance metrics behave
  • How all of this fits into WordPress and MySQL search architecture

Traditional search engines (like WordPress’s default search or MySQL’s MATCH AGAINST) match exact words.

Example

Query: “car”
Keyword search finds documents containing “car” — but misses:

  • automobile
  • vehicle
  • sedan
  • SUV

Vector search understands concepts, not just words.

Query: “car”
Matches documents about:

  • automobile
  • vehicle
  • sedan
  • SUV
  • truck

Because all these concepts live close together in semantic space.

2. Words as Points in Space

Embeddings convert text into vectors — numerical coordinates in a multi‑dimensional space.

Simple 2D Example

        Y-axis (Vehicle Type)
    5   │         🚗 car
    4   │     🚙 SUV    🚕 taxi
    3   │   🚚 truck
    2   │
    1   │   🐕 dog      🐱 cat
    0   └─────────────────────── X-axis (Animal vs Vehicle)
        0   1   2   3   4   5

Key idea:
Similar meanings → close together
Different meanings → far apart

Real embeddings have 384–1536 dimensions, but the principle is identical.

3. What Is a Vector?

A vector is simply a list of numbers.

2D Example

"car" = [4.2, 4.8]
"dog" = [1.3, 1.1]

Real Embedding Example (768 dimensions)

"car" = [0.23, -0.45, 0.12, 0.89, ...]

Each dimension encodes some semantic feature (vehicle‑ness, speed, size, etc.).

4. How Vector Search Works

Step 1 — Convert Documents to Vectors

Each document is embedded into a vector.

Step 2 — Convert Query to Vector

The user’s query is embedded the same way.

Step 3 — Compare Vectors

Using a distance metric (cosine, Euclidean, dot product, etc.).

Step 4 — Rank by Similarity

Closest vectors = most relevant results.

This is where distance metrics become critical.

5. Cosine Similarity (The Semantic Search Standard)

Cosine similarity measures the angle between vectors — ignoring length.

[ \text{Cosine Similarity} = \frac{A \cdot B}{||A|| \cdot ||B||} ]

  • 1.0 → identical direction
  • 0.0 → unrelated
  • -1.0 → opposite meaning

Cosine is the default for semantic search because it focuses on meaning, not magnitude.

6. Distance Metrics (WordPress & MySQL Context)

When implementing vector search in WordPress — via a plugin, custom table, or external vector DB — you must choose a distance metric.

Different metrics produce different rankings.

Below is a unified explanation of each metric, with diagrams and WordPress‑specific guidance.

6.1 Euclidean Distance (L2)

Straight‑line distance.

   y
   |
 4 |                B
 3 |
 2 |
 1 |      A
   |
   +---------------------------- x

Meaning:
Measures literal geometric distance.

Best for:
- Embeddings where magnitude encodes meaning
- General vector search

WordPress use:
- Custom MySQL tables
- External vector DBs that default to L2


6.2 Manhattan Distance (L1)

Grid‑based movement.

   y
   |
 4 |                B
 3 |                |
 2 |                |
 1 |      A -------+
   |
   +---------------------------- x

Meaning:
Sum of absolute differences.

Best for:
- Sparse vectors (rare in NLP)

WordPress use:
- Experimental scoring only


6.3 Cosine Distance

Angle between vectors.

Origin
  |
  |   A
  |  /
  | /
  |/___________
       \
        \
         B

Meaning:
Measures similarity of direction.

Best for:
- Semantic search
- Sentence embeddings

WordPress use:
- Hybrid search (BM25 + embeddings)
- Most semantic search plugins
- Recommended default


6.4 Dot Product

Alignment × magnitude.

Origin
  |
  |   A →→
  |  /
  | /
  |/___________ →→ B (longer)

Meaning:
Rewards vectors that point the same way and have large magnitude.

Best for:
- Recommender systems
- Models trained for dot‑product scoring

WordPress use:
- Related posts
- Product recommendations

6.5 Chebyshev Distance

Largest single‑dimension difference.

   y
   |
 4 |                B
 3 |
 2 |
 1 |      A
   |
   +---------------------------- x

Meaning:
“How bad is the worst mismatch?”

WordPress use:
- Outlier detection
- Quality control

Not used for semantic search.

6.6 Hamming Distance

Counts mismatched positions.

A = (2,1)
B = (6,4)

Differences:
- 2 vs 6 → different
- 1 vs 4 → different

Hamming distance = 2

Meaning:
Only for binary vectors.

WordPress use:
- Duplicate detection
- Hash fingerprints

Not used for embeddings.

7. Summary Table

Metric Measures Best For WordPress Use
🧭 Cosine Angle Semantic similarity Best default for semantic search
📏 Euclidean Straight‑line distance General vector search Custom tables, vector DBs
🎯 Dot Product Alignment × magnitude Recommenders Related posts/products
📐 Manhattan Grid distance Sparse vectors Rare
📊 Chebyshev Max deviation Outliers Data validation
🔢 Hamming Mismatched bits Binary vectors Duplicate detection

8. How This Fits Into WordPress Search Architecture

1. Default WordPress Search (LIKE)

No vectors. No semantics.

2. MySQL Full‑Text Search (MATCH AGAINST)

Lexical only — still no semantics.

3. Hybrid Search (BM25 + Embeddings)

Combine:
- BM25 for keyword precision
- Cosine similarity for semantic recall

This is the sweet spot for most WordPress sites.

4. Full Vector Search in MySQL

Store embeddings in a custom table (JSON, FLOAT[], etc.) and compute:

  • Cosine
  • Euclidean
  • Dot product
  • Manhattan

This gives you full semantic search without external services.

Query: “comfortable shoes for running”

Finds documents containing the exact words.
Misses synonyms like “sneakers”, “athletic”, “cushioned”.

Vector Search

Understands:

  • sneakers ≈ shoes
  • athletic ≈ running
  • cushioned ≈ comfortable

Produces dramatically better results.

10. Multi‑Language Magic

Embeddings place meaning in the same space across languages.

English “cat”
French “chat”
German “Katze”

→ All cluster together.

This makes vector search inherently multilingual.

11. Advantages & Limitations

Advantages

  • Understands meaning
  • Handles synonyms
  • Context‑aware
  • Typo‑tolerant
  • Multilingual
  • Harder to game

Limitations

  • Slower than keyword search
  • Requires more storage
  • Harder to explain
  • Embedding generation costs CPU/GPU
  • Sometimes you want exact matches

12. Quick Summary

Vector search in 5 points:

  1. Convert text → vectors
  2. Compare vectors using distance metrics
  3. Rank by similarity
  4. Understand meaning, not just words
  5. Works across languages