Design a benchmark harness for multi-hop and ambiguous RAG queries.

Instruction: Explain how you would build a benchmark that reflects the hard parts of retrieval-based assistants.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would build a benchmark that reflects the hard parts of retrieval-based assistants.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would design the harness around the questions that force the system to reason and stay honest. Easy one-hop lookups...

Related Questions