The offline eval suite is green, but live tool latency changes are creating user-visible failures. What would you add?

Instruction: Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would add environment-aware evaluation and monitoring instead of treating latency as someone else’s system problem. If tool latency affects user-visible quality, then timing is part of the workflow contract and belongs in the reliability picture.

That means tracing tool durations, measuring timeout and retry behavior, and replaying offline evals...

Upgrade to view official answer

The offline eval suite is green, but live tool latency changes are creating user-visible failures. What would you add?

Related Questions