A team keeps adding benchmark cases after each incident, and the suite is getting noisy. How would you clean it up?

Instruction: Explain how you would keep an evaluation suite useful as it grows.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would keep an evaluation suite useful as it grows.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

It is good to learn from incidents, but a benchmark can become a junk drawer. I would reorganize the cases...

Related Questions