Instruction: Describe how you would handle suspected contamination between the eval suite and the system being optimized.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would handle suspected contamination between the eval suite and the system being optimized.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
If I suspect leakage, I assume the reported score is inflated until proven otherwise. I would move to held-out data...
easy
easy
easy
easy
easy
easy