An agent passes step-level checks and still fails end-to-end tasks. How would you investigate?

Instruction: Describe how you would debug an agent that looks healthy at each step but still fails to complete the workflow.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would debug an agent that looks healthy at each step but still fails to complete the workflow.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

The way I'd think about it is this: That pattern usually means the local steps look reasonable in isolation, but the workflow is breaking at composition boundaries. The agent may choose acceptable actions one by one and still fail because state is not carried correctly, timing assumptions break, or...

Upgrade to view official answer

An agent passes step-level checks and still fails end-to-end tasks. How would you investigate?

Related Questions