How would you evaluate a model's performance?

Question

This question assesses the candidate's knowledge of model evaluation techniques and their ability to ensure that models are both accurate and applicable.

Accepted Answer

Example Answer

I would start by defining what success means for the actual business or product decision, because the right evaluation depends on the cost of different errors. For a fraud model, I care about a very different balance than I would for a recommendation model or a medical-screening model. So before I choose metrics, I want to know what type of mistake matters most.

From there, I would evaluate on a representative holdout set and look at more than one metric. For classification, that might include precision, recall, calibration, and threshold behavior. I would also do slice analysis to see whether the model performs unevenly across segments, and I would compare offline performance with what I expect to matter in production. A model is only truly strong if it performs well on the right data, for the right objective, under the conditions where it will actually be used.

Common Poor Answer

A weak answer says, "I would check accuracy," and stops there. That ignores class imbalance, error costs, calibration, and whether the evaluation setup matches the production problem.

How would you evaluate a model's performance?

Example Answer

Common Poor Answer

Related Questions