Instruction: Define data drift and discuss its potential impact on machine learning models when deployed in production environments.
Context: This question aims to assess the candidate's knowledge of data drift, a common challenge in MLOps, and their awareness of its implications on model performance over time.
The way I'd explain it in an interview is this: Data drift means the distribution of production inputs changes relative to the data the model saw during training. That can happen in feature values, category frequencies, text patterns, user behavior, or upstream preprocessing.
Its effect is that the model may start seeing examples that no longer match its learned assumptions, which can reduce performance even if the label relationship has not fully changed. In MLOps, detecting data drift early is useful because it often shows up before accuracy collapse becomes obvious.
What matters in an interview is not only knowing the definition, but being able to connect it back to how it changes modeling, evaluation, or deployment decisions in practice.
A weak answer says data drift means the data looks different, without explaining why that matters for a deployed model's assumptions and performance.
easy
medium
medium
medium
medium
hard