Instruction: Explain how you would expand evaluation when answer-only scoring misses bad execution.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would expand evaluation when answer-only scoring misses bad execution.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
If answer-level scoring says we are fine but users still suffer, the eval is too coarse. I would add checks...
easy
easy
easy
easy
easy
easy