Confounder
Definition
A confounder is a variable that affects both the treatment (X) and the outcome (Y) (a common cause), creating a spurious (non-causal) association between X and Y.
DAG representation:
Confounder (Z)
↙ ↘
Treatment (X) Outcome (Y)
Mathematical definition: A variable is a confounder when:
- affects (or is associated with )
- affects (or is associated with )
- is not an effect of (not on the causal path)
Intuitive Understanding
Core idea:
A confounder is a “third variable” that makes X and Y appear associated even without a direct causal relationship
Example: correlation between ice cream sales (X) and drowning accidents (Y)
Summer (Confounder)
↙ ↘
Ice cream sales Drowning accidents
- Ice cream does not cause drowning
- The common cause, summer, affects both
- Spurious association: correlation, not causation
Back-door Path
A confounder creates a back-door path:
X ← Z → Y
- A “back-door” path from X to Y
- This path transmits non-causal association
- Must be blocked: in order to identify the causal effect
Methods for Adjusting for Confounding
1. Statistical Control
Stratification:
# Analyze the X-Y relationship at each level of Z
for z_level in Z.unique():
subset = data[data['Z'] == z_level]
analyze(subset['X'], subset['Y'])
Regression:
is the effect of X after controlling for Z.
Propensity Score:
- Matching or weighting via the propensity score
2. Design-based Control
Randomization (RCT):
- Assign treatment at random
- Confounders become independent of treatment
- Back-door paths are blocked automatically
Natural Experiments:
- Instrumental Variables
- Regression Discontinuity
- Difference-in-Differences
3. Control by Design
Twin Studies:
- Monozygotic twins: share genes + family environment
- Within-pair analysis removes genetic confounding
Adoption Studies:
- Break the genetic link to remove genetic confounding
Measured vs Unmeasured Confounders
Measured Confounder
- Observable in the data
- Can be adjusted for via statistical control
- e.g., age, sex, education level
Unmeasured Confounder
- Unobservable (or not measured) in the data
- The causal effect cannot be identified
- Assess the impact with sensitivity analysis
Unmeasured U
↙ ↘
X Y
- If U is not measured, the X→Y effect is biased
Confounding vs Collider vs Mediator
| Variable type | DAG structure | Whether to control |
|---|---|---|
| Confounder | X ← Z → Y | Must control |
| Collider | X → Z ← Y | Must not control |
| Mediator | X → Z → Y | Depends on the goal |
Rule of thumb: do not control for post-treatment variables
Examples
Example 1: Education and Income
Intelligence
↙ ↘
Education → Income
- Confounder: Intelligence
- Spurious path: Education ← Intelligence → Income
- Solution: control for Intelligence
Example 2: Smoking and Lung Cancer (Historical)
Genetics?
↙ ↘
Smoking Lung Cancer
- Fisher’s argument: genetics could be a confounder
- Subsequent research established the causal effect of smoking
Example 3: Maternal Affection and Child Depression
Shared Genes
↙ ↘
Maternal Child
Affection Depression
- Genetic confounding: shared genes between parent and child
- Solution: remove the genetic link with adoption studies
Measurement Error in Confounders
Measurement error in a confounder is a serious problem:
- If cannot be measured exactly, it is replaced by
- Residual confounding: the influence of Z is not fully removed
- False positive rate: can approach 100% in large samples (Westfall & Yarkoni, 2016)
Related Concepts
- DAG - Visualizing causal structure
- Back-door Criterion - Conditions for adjusting for confounding
- Collider - A variable that must not be controlled for
- Mediator - A variable on the causal pathway
- Propensity Score - A method for adjusting for confounding
- Unconfoundedness - The no-hidden-confounders assumption
References
- rohrerThinkingClearlyCorrelations - Confounding and DAGs
- Pearl, J. (2009). Causality