A long-context model answers correctly without retrieval on benchmarks but fails in production. How would you decide what to keep?

Instruction: Explain how you would decide between long-context prompting and explicit retrieval.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would decide between long-context prompting and explicit retrieval.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would not let benchmark performance talk me out of retrieval without checking what the benchmark omitted. Long-context models can look great when the evaluation gives them clean, bounded context and stable documents. Production usually adds messy corpora, permission boundaries, conflicting versions, and freshness requirements...

Upgrade to view official answer

A long-context model answers correctly without retrieval on benchmarks but fails in production. How would you decide what to keep?

Related Questions