A browsing agent combines information from several safe pages into one unsafe action. How would you defend against that?

Instruction: Explain how you would handle compositional safety risk across several individually safe inputs.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would handle compositional safety risk across several individually safe inputs.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would defend at the action boundary, not just at page classification. Safe pages can still be composed into an unsafe conclusion or unsafe action, so the final action should be judged against policy and context independently of whether each source looked harmless alone.

I would...

Related Questions