Instruction: Describe when embeddings alone are enough and when lexical signals should stay in the stack.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Describe when embeddings alone are enough and when lexical signals should stay in the stack.
The way I'd approach it in an interview is this: Vector-only retrieval is fine when queries are mostly semantic and the corpus does not depend heavily on exact identifiers. But real corpora often contain product names, error codes, policy IDs, model numbers, or quoted phrases where lexical matching is doing important work. In those settings I default to hybrid retrieval unless evaluation shows I do not need it.
The reason is that dense search is good at intent and paraphrase, while lexical search is good at exactness. Hybrid gives me a stronger first-stage candidate set across both query types, and then reranking can decide what actually belongs in the prompt.
I make the decision with slice-based evaluation, not instinct. If vector-only already handles both paraphrase-heavy and exact-match queries well, I keep it simple. If I see misses on identifiers, rare terms, or quoted text, hybrid usually pays for itself quickly.
A weak answer is treating lexical search like an outdated fallback and assuming embeddings should handle everything. That usually misses exact identifiers, quoted text, and other cases where lexical signals are doing important work.
easy
easy
easy
easy
easy
easy