Instruction: Explain how you would test safety controls in a realistic but safe environment.
Context: Assesses whether the candidate can design a practical architecture and explain the main tradeoffs. Explain how you would test safety controls in a realistic but safe environment.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would design the sandbox to preserve realistic workflow context: retrieval, browsing, tools, memory, and permission boundaries. Pure prompt-only testing misses how many real attacks succeed only when the full workflow is available.
I also want attack diversity: direct jailbreaks, indirect injections in files or...
easy
easy
easy
easy
easy
easy