Instruction: Explain the major sources of latency outside raw model inference time.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Explain the major sources of latency outside raw model inference time.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
I start by assuming the model is only part of the latency budget. Retrieval, routing, queueing, and downstream tools often...
easy
easy
easy
easy
easy
easy