Your red-team prompts are catching nothing, but incidents keep appearing in production. What would you change?

Instruction: Describe how you would improve adversarial testing when it stops matching reality.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would improve adversarial testing when it stops matching reality.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

If red teaming never finds problems but production keeps doing it for us, the red-team set is stale. I would...

Related Questions