Tae Hyun Kim (Lowell)

Collider

3 min read #causal-inference#scm#dag

Definition

A collider is a variable affected by both the treatment (X) and the outcome (Y) (a common effect). In the structure X → C ← Y, C is a collider.

DAG representation:

  Treatment (X) → Collider (C) ← Outcome (Y)

Key properties:

  • Default state: blocks the association between X and Y
  • Conditioning on C: creates a spurious association between X and Y (collider bias)

Why “Collider”?

  • Two arrows “collide” at the same variable

Intuitive Understanding

Core idea:

Controlling for a collider makes an X-Y relationship that was never there appear to exist

Example: Attractiveness and Niceness on a dating app

   Attractive → Date ← Nice
  • Being attractive or being nice gets you selected as a date
  • Whole population: attractiveness and niceness are unrelated
  • Analyzing only the dates (conditioning on Date):
    • “A less attractive person must be nicer to be selected”
    • A spurious negative correlation arises!

Collider Bias Examples

1. Publication Bias (Meta-analysis)

  Methodological Rigor → Publication ← Innovativeness
  • Analyzing only published papers (conditioning on Publication):
    • Even with low rigor, an innovative paper gets published
    • Even if not innovative, a rigorous paper gets published
  • Result: rigor and innovativeness appear negatively correlated
  • Reality: in fact unrelated, or positively correlated

2. Berkson’s Paradox (Hospital Sample)

  Disease A → Hospitalization ← Disease B
  • Analyzing only the hospital sample:
    • Even without Disease A, one is hospitalized for Disease B
    • Even without Disease B, one is hospitalized for Disease A
  • Result: Disease A and B appear negatively correlated
  • Population: in fact unrelated

3. Nonresponse Bias (Survey)

  Variable X → Response ← Variable Y
  • Analyzing only respondents (conditioning on Response):
    • X and Y affect whether one responds
  • Result: the X-Y relationship is distorted

4. Attrition Bias (Longitudinal Study)

  Baseline X → Dropout ← Outcome Y
  • Analyzing only the remaining participants (non-dropout):
    • X and Y affect whether one drops out
  • Result: selection bias

5. Sample Selection Effect

  Variable X → Sample Selection ← Variable Y
  • Analyzing only a particular sample:
    • e.g., only successful people, only college entrants
  • Result: the X-Y relationship differs from the population

Why Does Collider Bias Occur?

Mathematical Intuition

C=f(X,Y)+ϵC = f(X, Y) + \epsilon

Conditioning on C: P(YX,C=c)P(YX)P(Y|X, C=c) \neq P(Y|X)

  • Fixing the value of C makes information about X informative about Y
  • “If X is large, then for C=c to hold, Y must be small”

Information Flow

Without conditioning on C:
    X         Y     (no path, independent)

With conditioning on C:
    X → [C] ← Y     (path opened, dependent)
  • Conditioning opens a “channel of information”

Identifying Colliders

Determining from a DAG

A variable C is a collider when:

  1. XCX \rightarrow C (X affects C)
  2. YCY \rightarrow C (Y affects C)

Temporal Clue

Rule of thumb: a post-treatment variable can be a collider

  • A variable that occurs after the treatment and the outcome
  • e.g., a final result, a selection variable

Caution: Not Every Post-treatment Variable Is a Collider

X → M → Y    (M is a Mediator, not a collider)
X → C ← Y    (C is a Collider)

Do NOT Control for Colliders

A Mistaken Practice

“Let’s control for as many variables as possible” → dangerous!

The Correct Approach

  1. Draw the DAG and grasp the causal structure
  2. Identify colliders
  3. Exclude colliders from the controls

Exception: Descendants of a Collider

X → C ← Y

    D
  • Controlling for D (a descendant of C) also induces collider bias
  • Because it transmits partial information about C

Collider vs Confounder

AspectConfounderCollider
DAG structureX ← Z → YX → C ← Y
RoleCommon causeCommon effect
Default stateCreates spurious associationBlocks association
Effect of controlRemoves spurious associationCreates spurious association
Whether to controlMust controlMust not control
  • DAG - Visualizing causal structure
  • Confounder - Common cause (must control)
  • Mediator - A variable on the causal pathway
  • Back-door Criterion - Conditions for causal identification
  • Selection Bias - A form of collider bias
  • v-structure - Unshielded collider, key to distinguishing MECs

References

  • rohrerThinkingClearlyCorrelations - Explanation of collider bias
  • Berkson, J. (1946). Limitations of the application of fourfold table analysis
  • Pearl, J. (2009). Causality

Local graph