The offline eval suite is green, but live tool latency changes are creating user-visible failures. What would you add?

Instruction: Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

A green offline suite is not enough if the live environment is the real source of risk. I would add...

Related Questions