Instruction: Explain how you would benchmark a coding agent on realistic engineering work.
Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would benchmark a coding agent on realistic engineering work.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would build the suite from real engineering work patterns: small bug fixes, bounded refactors, config changes, test updates, docs sync, and multi-file feature slices with realistic repo context. Each task should include validation criteria that reflect actual engineering expectations, not just string matching.
I would...
easy
easy
easy
easy
easy
easy