Latency doubled after adding reranking and citations. What would you change first?

Instruction: Describe your first optimization move when RAG quality features add too much latency.

Context: Tests how the candidate diagnoses the problem, chooses the safest next step, and reasons through recovery. Describe your first optimization move when RAG quality features add too much latency.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would break the latency increase into stages before I changed architecture. Saying reranking and citations made it slower is too broad to act on. I want to know how much time is going to candidate retrieval, reranker compute, citation assembly, prompt growth, and model generation.

Then I would...

Related Questions