Semantic search promises a revolution: contextual relevance and natural language understanding with just a few lines of code. On a notebook or a POC, it’s magical. But what happens when your index exceeds a billion vectors?

The magic quickly gives way to the brutality of engineering: exploding latency, uncontrolled infrastructure costs, and RAM challenges.

In this talk, we leave marketing buzz at the door and dive into the guts of Elasticsearch and OpenSearch at a very large scale. We will cover how to:

  • Architect your clusters to handle a billion embeddings without failing.
  • Optimize the critical trade-off between precision (recall) and performance (latency).
  • Reduce costs using quantization strategies and intelligent chunking.

If you need to move from a “Hello World” semantic search to massive production, this session is your survival guide.