Design a failure-recovery strategy for an agent that coordinates three external tools.

Instruction: Explain how you would design recovery when one step fails after earlier steps succeeded.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would design recovery when one step fails after earlier steps succeeded.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would design around partial failure as the default assumption. Each tool interaction should have clear preconditions, idempotent identifiers where possible, explicit timeout and retry policy, and a known fallback when the tool is unavailable or returns an ambiguous result.

I would also keep the workflow state...

Related Questions