Instruction: Explain why tail latency matters in compound AI workflows.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain why tail latency matters in compound AI workflows.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
The way I'd approach it in an interview is this: Tail latency usually comes from composition. Each step adds variance, and the slowest combination of retrieval, queueing, tool calls, and model inference is what the user ends up feeling on bad requests.
I break the workflow into stages...
easy
easy
easy
easy
easy
easy