Positivity (Overlap) · Tae Hyun Kim (Lowell)

Definition

The probability of receiving treatment lies strictly between 0 and 1 for every covariate value

0 < P(W=w \mid X=x) < 1, \quad \forall w \in \{0, 1\}, \, \forall x \in \mathcal{X}

For binary treatment:

0 < e(x) < 1, \quad \text{where } e(x) = P(W=1 \mid X=x)

Intuitive Understanding

Core Idea

“Both the treated and control groups are observable across every combination of characteristics”

At any value of $X$ , both treated and control outcomes can be estimated
Estimate causal effects without extrapolation

Common Support

The region where the covariate distributions of the treated and control groups overlap:

    Control distribution   Treatment distribution
           ___               ___
          /   \             /   \
         /     \           /     \
        /       \         /       \
    ___/    overlap region   \___
        <=================>
              Common Support

Positivity Violations

1. Deterministic Treatment

Treatment is deterministic at certain values of $X$ :

Examples:

Those over 65 are always offered only program A
A new product is not launched in certain regions
Patients with contraindications cannot be prescribed the drug

2. Practical Positivity Violation

Theoretically possible but not observed in the data:

Small sample size
Rare covariate combinations

Propensity Score Perspective

e(x) \approx 0 \quad \text{or} \quad e(x) \approx 1

Extreme propensity scores → a sign of positivity violation

Impact of Violations

1. IPW Instability

In Inverse Propensity Weighting:

\text{Weight} = \frac{1}{e(x)} \quad \text{or} \quad \frac{1}{1-e(x)}

When $e(x) \to 0$ or $e(x) \to 1$ , weights explode.

2. Inestimability

In regions where $e(x) = 0$ :

$E[Y(1) \mid X=x]$ is inestimable (no treated units)

In regions where $e(x) = 1$ :

$E[Y(0) \mid X=x]$ is inestimable (no control units)

3. High Variance

The weaker the overlap, the higher the variance of the estimator.

Diagnostic Methods

1. Propensity Score Histogram

# Compare PS distributions of treated/control groups
import matplotlib.pyplot as plt

plt.hist(ps[W==1], alpha=0.5, label='Treated')
plt.hist(ps[W==0], alpha=0.5, label='Control')
plt.legend()

Good overlap: the two distributions overlap substantially Poor overlap: separated distributions

2. Proportion of Extreme PS

extreme_ps = (ps < 0.01) | (ps > 0.99)
print(f"Extreme PS: {extreme_ps.mean()*100:.1f}%")

3. Checking Common Support

Check the intersection of the PS ranges of the treated and control groups.

Solutions

1. Trimming

Remove samples with extreme propensity scores:

\{i : \alpha < e(x_i) < 1-\alpha\}

Typically $\alpha = 0.01$ or $0.05$ .

For details: Trimming

Advantage: stable estimation Disadvantage: changes the estimand (overall ATE → conditional ATE)

2. Overlap Weighting

Assign less weight to regions with extreme PS:

h(x) = e(x)(1-e(x))

For details: Overlap Weighting

3. Bounds Estimation

Provide bounds in regions of positivity violation:

\tau_{lb} \leq \text{ATE} \leq \tau_{ub}

Partial identification approach.

4. Extrapolation (caution required)

Model-based extrapolation:

Strongly depends on model assumptions
Sensitivity analysis is essential

Causal Assumptions Overview - consolidated overview of the three core assumptions
Strong Ignorability - Ignorability + Positivity
Propensity Score - $e(x) = P(W=1 \mid X=x)$
IPW - a method sensitive to positivity
Trimming - responding to positivity violations
Overlap Weighting - a robust weighting method

References

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
yaoSurveyCausalInference2021 - Section 2.3
Crump, R. K., et al. (2009). Dealing with limited overlap in estimation of average treatment effects

Local graph