A long-context model answers correctly without retrieval on benchmarks but fails in production. How would you decide what to keep?

Instruction: Explain how you would decide between long-context prompting and explicit retrieval.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would decide between long-context prompting and explicit retrieval.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would not let benchmark performance talk me out of retrieval without checking what the benchmark omitted. Long-context models can look great when the evaluation gives them clean, bounded context and stable documents. Production usually adds messy corpora, permission boundaries, conflicting versions, and freshness requirements...

Related Questions