A web-browsing agent reads a page that tells it to ignore previous instructions. What do you do first?

Instruction: Describe your first response to a likely prompt injection in retrieved content.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe your first response to a likely prompt injection in retrieved content.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

The way I'd think about it is this: First, I treat the page as untrusted content and stop it from influencing control decisions. That means the agent should not be allowed to reinterpret the page’s instructions as higher-priority operating rules.

Then I inspect whether the...

Related Questions