How do you choose between a larger model and a smaller routed model?

Question

Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Describe how you would decide whether to use one large model or route traffic across models of different sizes.

Accepted Answer

Example Answer

The way I'd approach it in an interview is this: I choose by comparing workflow value, not just offline accuracy. A larger model is often simpler operationally because it reduces routing complexity, but it may be too expensive or slow for the majority of traffic. A smaller routed model can be much more efficient if the router is good enough and the failure cases are well understood.

The key question is where the quality gap matters. If only a subset of requests truly needs the larger model, routing usually pays off. If routing errors create hard-to-diagnose regressions or fairness issues, one stronger model may be the better product choice.

I want the architecture that preserves reliability at the right cost, not the one that wins an abstract model-comparison debate.

Common Poor Answer

A weak answer is saying you should always use the smaller model when it is cheaper. Routing complexity and failure shape matter just as much as raw model price.

How do you choose between a larger model and a smaller routed model?

Example Answer

Common Poor Answer

Related Questions