Your chatbot feels slow even though median model latency looks fine. How would you debug it?

Instruction: Describe how you would investigate a user-facing latency problem when the median metric looks healthy.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe how you would investigate a user-facing latency problem when the median metric looks healthy.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would look beyond the median immediately. A chatbot can feel slow because of tail latency, queueing, retrieval overhead, client rendering delays, or because the first useful token arrives too late even if total inference time is acceptable.

I would break down time to first token,...

Related Questions