Instruction: Explain how you would debug a guardrail issue caused by retrieved evidence rather than user intent.
Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would debug a guardrail issue caused by retrieved evidence rather than user intent.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I would inspect the retrieval path and content labeling, not just the user prompt. If the unsafe material came from retrieved context, then the system is likely treating retrieval as inherently trustworthy and allowing harmful content to enter the answer path unchecked.
I would compare...
easy
easy
easy
easy
easy
easy