An assistant follows policy in single-turn tests and drifts in long-running sessions. How would you investigate?

Instruction: Explain how you would debug a safety issue that appears over long sessions rather than in one turn.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would debug a safety issue that appears over long sessions rather than in one turn.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

Long sessions create new failure paths because context and memory accumulate. I would inspect whether the assistant is drifting because...

Related Questions