Instruction: Describe how you would respond when prompt-only safety is clearly losing the race.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would respond when prompt-only safety is clearly losing the race.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would stop treating the issue as a prompt patching race and move the defense lower in the stack. If users can iterate around prompt wording quickly, the system needs stronger policy enforcement, action constraints, and runtime validation that do not depend on the...
easy
easy
easy
easy
easy
easy