How do you choose between vector-only and hybrid retrieval?

Question

Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Describe when embeddings alone are enough and when lexical signals should stay in the stack.

Accepted Answer

Example Answer

The way I'd approach it in an interview is this: Vector-only retrieval is fine when queries are mostly semantic and the corpus does not depend heavily on exact identifiers. But real corpora often contain product names, error codes, policy IDs, model numbers, or quoted phrases where lexical matching is doing important work. In those settings I default to hybrid retrieval unless evaluation shows I do not need it.

The reason is that dense search is good at intent and paraphrase, while lexical search is good at exactness. Hybrid gives me a stronger first-stage candidate set across both query types, and then reranking can decide what actually belongs in the prompt.

I make the decision with slice-based evaluation, not instinct. If vector-only already handles both paraphrase-heavy and exact-match queries well, I keep it simple. If I see misses on identifiers, rare terms, or quoted text, hybrid usually pays for itself quickly.

Common Poor Answer

A weak answer is treating lexical search like an outdated fallback and assuming embeddings should handle everything. That usually misses exact identifiers, quoted text, and other cases where lexical signals are doing important work.

How do you choose between vector-only and hybrid retrieval?

Example Answer

Common Poor Answer

Related Questions