Instruction: Explain how a green eval result can still mislead a team.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain how a green eval result can still mislead a team.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
The way I'd approach it in an interview is this: False confidence usually comes from a benchmark that looks cleaner, larger, or more scientific than it really is. The common sources are narrow coverage, leakage from tuning, grader bias, small sample slices, and reporting only aggregate wins while hiding...
easy
easy
easy
easy
easy
easy