How would you explain the difference between task success and model quality?

Instruction: Describe the difference between judging the model itself and judging whether the user task was completed.

Context: Checks whether the candidate can explain the core concept clearly and connect it to real production decisions. Describe the difference between judging the model itself and judging whether the user task was completed.

Example Answer

The way I'd approach it in an interview is this: Model quality is about the output in isolation. Task success is about whether the user actually completed the job they came to do. Those two things overlap, but they are not interchangeable.

A model can produce fluent, highly rated text and still fail the task because it used the wrong tool, skipped a required step, gave an answer too late, or created work the user then had to clean up. The reverse can also happen: the wording is imperfect, but the user still reaches the right outcome quickly.

That is why I separate model metrics from workflow metrics. Model quality helps me understand how the system reasons and responds. Task success tells me whether the product is actually useful. If I collapse them into one number, I usually end up overvaluing polished output and undervaluing operational correctness.

Common Poor Answer

A weak answer is saying task success is just model quality plus a few product metrics. That misses the fact that workflow completion and output quality can move in different directions.

Related Questions