Instruction: Describe how you would handle suspected contamination between the eval suite and the system being optimized.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would handle suspected contamination between the eval suite and the system being optimized.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would take the suspicion seriously because leakage can make the benchmark look more predictive than it really is. The first step is to separate holdout data from anything the team can see or optimize against directly.
Then I would audit how prompts, examples, and...
easy
easy
easy
easy
easy
easy