Instruction: Describe your first response to a likely prompt injection in retrieved content.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe your first response to a likely prompt injection in retrieved content.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
The way I'd think about it is this: First, I treat the page as untrusted content and stop it from influencing control decisions. That means the agent should not be allowed to reinterpret the page’s instructions as higher-priority operating rules.
Then I inspect whether the...
easy
easy
easy
easy
easy
easy