2026-06-08

Best search ranking stack for AI products

A good search stack works like a funnel. Cheap methods collect a broad candidate set first; expensive methods refine a much smaller set later. Problems often begin when teams replace exact-text search with embeddings instead of combining the two.

Start with filters and BM25, a strong exact-text ranking method. Add dense retrieval to find semantic matches, then merge both result lists with Reciprocal Rank Fusion (RRF) or a similar method. Rerank the shortlist with a cross-encoder. Use an LLM only for a tiny final set where its quality gain justifies the extra latency and cost.

Recommended stack

Stage	Default	Job
Filtering	Structured filters	Enforce tenant, permissions, product, language, time, and availability.
Lexical retrieval	BM25	Exact names, IDs, error codes, legal terms, and high-precision tokens.
Dense retrieval	Embeddings	Synonyms, paraphrases, fuzzy intent, and semantic recall.
Fusion	Reciprocal Rank Fusion or weighted retriever composition	Merge sparse and dense candidates without pretending scores are comparable.
Reranking	Cross-encoder	Reorder 20 to 100 candidates with query-document interaction.
Final precision	LLM reranker or answer model	Resolve nuanced relevance only after the list is small.
Evaluation	Recall@k, nDCG, MRR, click labels, human labels	Prove each stage improves the previous one.

Use-case defaults

Product surface	Good default	Why
Documentation search	BM25 plus embeddings plus cross-encoder	Exact API names and semantic questions both matter.
RAG retrieval	Hybrid retrieval plus reranker plus citation checks	Missing evidence is usually worse than slow generation.
Product search	Lexical filters plus hybrid retrieval plus business features	Availability, price, popularity, and exact facets matter.
Support search	Hybrid retrieval plus freshness and ticket metadata	Similar wording and current policy both matter.
Internal knowledge base	BM25 baseline, then dense retrieval from query logs	Start measurable before adding model cost.
Legal or compliance search	Lexical baseline plus strict filters, then careful semantic expansion	False positives and false negatives both have high cost.

Why BM25 still belongs in the stack

Embeddings find text with similar meaning, but they do not reliably replace exact matching. Error codes, function names, product SKUs, legal phrases, and people’s names often carry intent through their exact spelling. BM25 remains a strong baseline because it rewards terms the user actually typed.

Dense retrieval adds recall when users do not know the exact vocabulary. The right pattern is not BM25 or embeddings. Use BM25 for lexical recall, embeddings for semantic recall, and fusion to combine them.

When to add a reranker

Add a cross-encoder when the right documents appear somewhere in the top 50 but not near the top. That is the cleanest signal that candidate generation works and ranking needs help.

Do not add an LLM reranker before a cross-encoder unless the candidate set is tiny. It also needs a relevance judgment subtle enough to justify the cost. LLM reranking can help, but it is expensive and slower. Measure it against a cheaper reranker.

Evaluation sequence

Label a small set of real queries.
Measure BM25 alone.
Add dense retrieval and measure recall delta.
Add fusion and measure nDCG and Recall@k.
Add cross-encoder reranking and measure Precision@1 and nDCG.
Add LLM reranking only if it improves quality after cost and latency are included.
Watch production metrics: zero-result rate, reformulation rate, click-through, answer correction, p95 latency, and cost.

Deeper reading

Search Ranking Stack in 2026 gives the full implementation walkthrough.
RAG Evaluation Metrics explains retrieval and citation evaluation.