Design a benchmark suite for real software-engineering tasks instead of toy prompts.

Instruction: Explain how you would benchmark a coding agent on realistic engineering work.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would benchmark a coding agent on realistic engineering work.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would build the suite from real engineering work patterns: small bug fixes, bounded refactors, config changes, test updates, docs sync, and multi-file feature slices with realistic repo context. Each task should include validation criteria that reflect actual engineering expectations, not just string matching.

I would...

Related Questions