Design a benchmark suite for real software-engineering tasks instead of toy prompts.

Instruction: Explain how you would benchmark a coding agent on realistic engineering work.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would benchmark a coding agent on realistic engineering work.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would benchmark the work engineers actually do, including the parts that are awkward to score. If the suite only...

Related Questions