Design an end-to-end evaluation system for autonomous agents with tools and approvals.

Instruction: Explain how you would evaluate a high-autonomy workflow that can plan, act, and ask for approval.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would evaluate a high-autonomy workflow that can plan, act, and ask for approval.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would evaluate the whole loop, not just the final answer or final action. High-autonomy systems need to be judged...

Related Questions