Instruction: Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would expand evaluation when operational conditions are causing issues the offline suite misses.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would add environment-aware evaluation and monitoring instead of treating latency as someone else’s system problem. If tool latency affects user-visible quality, then timing is part of the workflow contract and belongs in the reliability picture.
That means tracing tool durations, measuring timeout and retry behavior, and replaying offline evals...
easy
easy
easy
easy
easy
easy