Instruction: Explain how you would decide between long-context prompting and explicit retrieval.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would decide between long-context prompting and explicit retrieval.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would keep the benchmark result in perspective. Long context can look great on static test sets, but production usually...
easy
easy
easy
easy
easy
easy