Instruction: Explain what a useful offline evaluation should de-risk before launch.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain what a useful offline evaluation should de-risk before launch.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
Offline evals should tell me whether a change is safe enough to expose to users. I want them to catch...
easy
easy
easy
easy
easy
easy