Building a Modern Search Ranking Stack: From Embeddings to LLM-Powered Relevance
Search stopped being a string-matching problem a while ago. When someone types "wireless headphones" into a product search engine, they expect more than items containing those two words. They want the best result given semantic relevance, product quality, user preferences, and availability. The gap between what BM25 returns and what users actually want has changed how search systems get built.
This post walks through a modern search ranking stack: a multi-stage pipeline combining sparse lexical retrieval, dense semantic embeddings, reciprocal rank fusion, cross-encoder reranking, and LLM listwise ranking. I built a working demo that benchmarks each stage on the Amazon ESCI product search dataset, so every layer's contribution shows up in real numbers.