Instruction: Define 'Simpson's Paradox' and provide an example of how it can lead to incorrect causal interpretations.
Context: This question evaluates the candidate's understanding of Simpson's Paradox and their ability to recognize situations where causal relationships can be confounded by it.
Official answer available
Preview the opening of the answer, then unlock the full walkthrough.
Simpson's Paradox is essentially about the confounding variables that are not apparent at a superficial data analysis level. It serves as a reminder that correlation does not imply causation and that aggregating data without considering the context can mask true relationships.
Let me provide an example to illustrate how Simpson's Paradox can mislead causal interpretations, especially relevant to the role of a Data Scientist. Imagine a pharmaceutical company conducts clinical trials for two different drugs intended to treat the same disease. When examining the recovery rates separately for males and females, Drug A shows better recovery rates than Drug B for both genders. However, when the data for both genders are combined, Drug B appears to have a higher overall recovery rate. This reversal happens because the distribution of participants across the genders was...
easy
easy
easy
easy
easy
easy