Your evaluation says retrieval improved, but user trust got worse. How would you explain the mismatch?

Instruction: Explain how you would reason about a gap between internal retrieval metrics and user perception.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would reason about a gap between internal retrieval metrics and user perception.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would explain that the eval improved one layer while the user experienced the whole product. Retrieval can get better on a metric like recall@k and still hurt trust if the new candidate set is noisier, more redundant, harder to read, or more likely to support overconfident synthesis.

For example,...

Related Questions