Tae Hyun Kim (Lowell)

Confounder

3 min read #causal-inference#scm#dag

Definition

A confounder is a variable that affects both the treatment (X) and the outcome (Y) (a common cause), creating a spurious (non-causal) association between X and Y.

DAG representation:

    Confounder (Z)
       ↙        ↘
  Treatment (X)   Outcome (Y)

Mathematical definition: A variable ZZ is a confounder when:

  1. ZZ affects XX (or is associated with XX)
  2. ZZ affects YY (or is associated with YY)
  3. ZZ is not an effect of XX (not on the causal path)

Intuitive Understanding

Core idea:

A confounder is a “third variable” that makes X and Y appear associated even without a direct causal relationship

Example: correlation between ice cream sales (X) and drowning accidents (Y)

       Summer (Confounder)
          ↙        ↘
   Ice cream sales    Drowning accidents
  • Ice cream does not cause drowning
  • The common cause, summer, affects both
  • Spurious association: correlation, not causation

Back-door Path

A confounder creates a back-door path:

X ← Z → Y
  • A “back-door” path from X to Y
  • This path transmits non-causal association
  • Must be blocked: in order to identify the causal effect

Methods for Adjusting for Confounding

1. Statistical Control

Stratification:

# Analyze the X-Y relationship at each level of Z
for z_level in Z.unique():
    subset = data[data['Z'] == z_level]
    analyze(subset['X'], subset['Y'])

Regression: Y=β0+β1X+β2Z+ϵY = \beta_0 + \beta_1 X + \beta_2 Z + \epsilon

β1\beta_1 is the effect of X after controlling for Z.

Propensity Score: e(X)=P(X=1Z)e(X) = P(X=1|Z)

  • Matching or weighting via the propensity score

2. Design-based Control

Randomization (RCT):

  • Assign treatment at random
  • Confounders become independent of treatment
  • Back-door paths are blocked automatically

Natural Experiments:

  • Instrumental Variables
  • Regression Discontinuity
  • Difference-in-Differences

3. Control by Design

Twin Studies:

  • Monozygotic twins: share genes + family environment
  • Within-pair analysis removes genetic confounding

Adoption Studies:

  • Break the genetic link to remove genetic confounding

Measured vs Unmeasured Confounders

Measured Confounder

  • Observable in the data
  • Can be adjusted for via statistical control
  • e.g., age, sex, education level

Unmeasured Confounder

  • Unobservable (or not measured) in the data
  • The causal effect cannot be identified
  • Assess the impact with sensitivity analysis
       Unmeasured U
          ↙        ↘
         X          Y
  • If U is not measured, the X→Y effect is biased

Confounding vs Collider vs Mediator

Variable typeDAG structureWhether to control
ConfounderX ← Z → YMust control
ColliderX → Z ← YMust not control
MediatorX → Z → YDepends on the goal

Rule of thumb: do not control for post-treatment variables

Examples

Example 1: Education and Income

      Intelligence
       ↙        ↘
  Education  →  Income
  • Confounder: Intelligence
  • Spurious path: Education ← Intelligence → Income
  • Solution: control for Intelligence

Example 2: Smoking and Lung Cancer (Historical)

      Genetics?
       ↙        ↘
    Smoking   Lung Cancer
  • Fisher’s argument: genetics could be a confounder
  • Subsequent research established the causal effect of smoking

Example 3: Maternal Affection and Child Depression

      Shared Genes
       ↙        ↘
  Maternal       Child
  Affection    Depression
  • Genetic confounding: shared genes between parent and child
  • Solution: remove the genetic link with adoption studies

Measurement Error in Confounders

Measurement error in a confounder is a serious problem:

Z=Z+ϵZZ^* = Z + \epsilon_Z

  • If ZZ cannot be measured exactly, it is replaced by ZZ^*
  • Residual confounding: the influence of Z is not fully removed
  • False positive rate: can approach 100% in large samples (Westfall & Yarkoni, 2016)
  • DAG - Visualizing causal structure
  • Back-door Criterion - Conditions for adjusting for confounding
  • Collider - A variable that must not be controlled for
  • Mediator - A variable on the causal pathway
  • Propensity Score - A method for adjusting for confounding
  • Unconfoundedness - The no-hidden-confounders assumption

References

  • rohrerThinkingClearlyCorrelations - Confounding and DAGs
  • Pearl, J. (2009). Causality

Local graph