Your red-team prompts are catching nothing, but incidents keep appearing in production. What would you change?

Instruction: Describe how you would improve adversarial testing when it stops matching reality.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would improve adversarial testing when it stops matching reality.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would assume the red-team set is too synthetic, too narrow, or too disconnected from real workflows. If production incidents keep happening while red-team prompts show nothing, then the attack surface in the benchmark does not look enough like the attack surface in the product.

I would mine...

Related Questions