Your evaluation says retrieval improved, but user trust got worse. How would you explain the mismatch?

Instruction: Explain how you would reason about a gap between internal retrieval metrics and user perception.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Explain how you would reason about a gap between internal retrieval metrics and user perception.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would explain that the eval improved one layer while the user experienced the whole product. Retrieval can get better on a metric like recall@k and still hurt trust if the new candidate set is noisier, more redundant, harder to read, or more likely to support overconfident synthesis.

For example,...

Upgrade to view official answer

Your evaluation says retrieval improved, but user trust got worse. How would you explain the mismatch?

Related Questions