Resolution of Simpson's paradox via the common cause principle

Abstract

Simpson's paradox is an obstacle to establishing a probabilistic association between two events a1 and a2, given the third (lurking) random variable B. We focus on scenarios when the random variables A (which combines a1, a2, and their complements) and B have a common cause C that need not be observed. Alternatively, we can assume that C screens out A from B. For such cases, the correct association between a1 and a2 is to be defined via conditioning over C. This setup generalizes the original Simpson's paradox: now its two contradicting options refer to two particular and different causes C. We show that if B and C are binary and A is quaternary (the minimal and the most widespread situation for the Simpson's paradox), the conditioning over any binary common cause C establishes the same direction of association between a1 and a2 as the conditioning over B in the original formulation of the paradox. Thus, for the minimal common cause, one should choose the option of Simpson's paradox that assumes conditioning over B and not its marginalization. The same conclusion is reached when Simpson's paradox is formulated via 3 continuous Gaussian variables: within the minimal formulation of the paradox (3 scalar continuous variables A1, A2, and B), one should choose the option with the conditioning over B.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…