Explain how you would conduct a post-mortem analysis for an ML model that failed in production.

Instruction: Detail the process for performing a thorough analysis after an ML model fails or underperforms in a production setting.

Context: This question seeks to understand how the candidate learns from failures and applies those lessons to future ML deployments.

Official answer available

Preview the opening of the answer, then unlock the full walkthrough.

I would treat the post-mortem as a systems investigation, not a blame exercise. First I would establish the timeline: what changed, how the failure was detected, which users or segments were affected, and whether the issue came from data, code, configuration, infrastructure,...

Related Questions