Instruction: Describe how you would decide whether to use one large model or route traffic across models of different sizes.
Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Describe how you would decide whether to use one large model or route traffic across models of different sizes.
The way I'd approach it in an interview is this: I choose by comparing workflow value, not just offline accuracy. A larger model is often simpler operationally because it reduces routing complexity, but it may be too expensive or slow for the majority of traffic. A smaller routed model can be much more efficient if the router is good enough and the failure cases are well understood.
The key question is where the quality gap matters. If only a subset of requests truly needs the larger model, routing usually pays off. If routing errors create hard-to-diagnose regressions or fairness issues, one stronger model may be the better product choice.
I want the architecture that preserves reliability at the right cost, not the one that wins an abstract model-comparison debate.
A weak answer is saying you should always use the smaller model when it is cheaper. Routing complexity and failure shape matter just as much as raw model price.
easy
easy
easy
easy
easy
easy