Instruction: Explain how you would keep an evaluation suite useful as it grows.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would keep an evaluation suite useful as it grows.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
It is good to learn from incidents, but a benchmark can become a junk drawer. I would reorganize the cases...
easy
easy
easy
easy
easy
easy