Instruction: Describe what overfitting is and the strategies you use to avoid it.
Context: This question tests the candidate's understanding of a fundamental issue in machine learning and their ability to apply best practices in model training.
The way I'd explain it in an interview is this: Overfitting happens when a model learns the training data too specifically, including noise or accidental patterns that do not generalize to new examples. The result is usually strong training performance but disappointing validation or production performance. In simple terms, the model memorizes too much and generalizes too little.
To prevent it, I would first make sure the evaluation setup is clean and free of leakage. Then I would use the standard controls that fit the problem: simpler models when appropriate, regularization, cross-validation, early stopping, dropout for neural networks, better feature selection, and more representative data if possible. I also pay attention to whether the model is too flexible relative to the amount and quality of training data. Preventing overfitting is really about matching model complexity to the amount of reliable signal available.
A weak answer says overfitting means "the model is too complex" and leaves it there, without explaining the train-test gap or the practical ways to diagnose and prevent it.