Tae Hyun Kim (Lowell)

Positivity (Overlap)

3 min read #causal-inference#potential-outcomes

Definition

The probability of receiving treatment lies strictly between 0 and 1 for every covariate value

0<P(W=wX=x)<1,w{0,1},xX0 < P(W=w \mid X=x) < 1, \quad \forall w \in \{0, 1\}, \, \forall x \in \mathcal{X}

For binary treatment:

0<e(x)<1,where e(x)=P(W=1X=x)0 < e(x) < 1, \quad \text{where } e(x) = P(W=1 \mid X=x)

Intuitive Understanding

Core Idea

“Both the treated and control groups are observable across every combination of characteristics”

  • At any value of XX, both treated and control outcomes can be estimated
  • Estimate causal effects without extrapolation

Common Support

The region where the covariate distributions of the treated and control groups overlap:

    Control distribution   Treatment distribution
           ___               ___
          /   \             /   \
         /     \           /     \
        /       \         /       \
    ___/    overlap region   \___
        <=================>
              Common Support

Positivity Violations

1. Deterministic Treatment

Treatment is deterministic at certain values of XX:

Examples:

  • Those over 65 are always offered only program A
  • A new product is not launched in certain regions
  • Patients with contraindications cannot be prescribed the drug

2. Practical Positivity Violation

Theoretically possible but not observed in the data:

  • Small sample size
  • Rare covariate combinations

Propensity Score Perspective

e(x)0ore(x)1e(x) \approx 0 \quad \text{or} \quad e(x) \approx 1

Extreme propensity scores → a sign of positivity violation


Impact of Violations

1. IPW Instability

In Inverse Propensity Weighting:

Weight=1e(x)or11e(x)\text{Weight} = \frac{1}{e(x)} \quad \text{or} \quad \frac{1}{1-e(x)}

When e(x)0e(x) \to 0 or e(x)1e(x) \to 1, weights explode.

2. Inestimability

In regions where e(x)=0e(x) = 0:

  • E[Y(1)X=x]E[Y(1) \mid X=x] is inestimable (no treated units)

In regions where e(x)=1e(x) = 1:

  • E[Y(0)X=x]E[Y(0) \mid X=x] is inestimable (no control units)

3. High Variance

The weaker the overlap, the higher the variance of the estimator.


Diagnostic Methods

1. Propensity Score Histogram

# Compare PS distributions of treated/control groups
import matplotlib.pyplot as plt

plt.hist(ps[W==1], alpha=0.5, label='Treated')
plt.hist(ps[W==0], alpha=0.5, label='Control')
plt.legend()

Good overlap: the two distributions overlap substantially Poor overlap: separated distributions

2. Proportion of Extreme PS

extreme_ps = (ps < 0.01) | (ps > 0.99)
print(f"Extreme PS: {extreme_ps.mean()*100:.1f}%")

3. Checking Common Support

Check the intersection of the PS ranges of the treated and control groups.


Solutions

1. Trimming

Remove samples with extreme propensity scores:

{i:α<e(xi)<1α}\{i : \alpha < e(x_i) < 1-\alpha\}

Typically α=0.01\alpha = 0.01 or 0.050.05.

For details: Trimming

Advantage: stable estimation Disadvantage: changes the estimand (overall ATE → conditional ATE)

2. Overlap Weighting

Assign less weight to regions with extreme PS:

h(x)=e(x)(1e(x))h(x) = e(x)(1-e(x))

For details: Overlap Weighting

3. Bounds Estimation

Provide bounds in regions of positivity violation:

τlbATEτub\tau_{lb} \leq \text{ATE} \leq \tau_{ub}

Partial identification approach.

4. Extrapolation (caution required)

Model-based extrapolation:

  • Strongly depends on model assumptions
  • Sensitivity analysis is essential

  • Causal Assumptions Overview - consolidated overview of the three core assumptions
  • Strong Ignorability - Ignorability + Positivity
  • Propensity Score - e(x)=P(W=1X=x)e(x) = P(W=1 \mid X=x)
  • IPW - a method sensitive to positivity
  • Trimming - responding to positivity violations
  • Overlap Weighting - a robust weighting method

References

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
  • yaoSurveyCausalInference2021 - Section 2.3
  • Crump, R. K., et al. (2009). Dealing with limited overlap in estimation of average treatment effects

Local graph