Design a sandbox for testing jailbreak and prompt injection resilience.

Instruction: Explain how you would test safety controls in a realistic but safe environment.

Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would test safety controls in a realistic but safe environment.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would design the sandbox to preserve realistic workflow context: retrieval, browsing, tools, memory, and permission boundaries. Pure prompt-only testing misses how many real attacks succeed only when the full workflow is available.

I also want attack diversity: direct jailbreaks, indirect injections in files or...

Related Questions