Propensity Score Matching (PSM) · Tae Hyun Kim (Lowell)

Definition

Matching treated and control individuals with similar propensity scores

j(i) = \arg\min_{j: W_j=0} |e(X_i) - e(X_j)|

Here, $e(X) = P(W=1 \mid X)$ is the Propensity Score.

Intuitive Understanding

Why match on the PS?

Rosenbaum & Rubin (1983):

W \perp\!\!\!\perp X \mid e(X)

If the PS is the same, the treatment probability is the same
→ matching is possible on the scalar $e(X)$ instead of the high-dimensional $X$
Dimensionality reduction: mitigates the curse of dimensionality

Example

Patient	Age	Sex	Medical history	PS
A (treated)	45	M	Hypertension	0.72
B (control)	52	F	Diabetes	0.70

The $X$ values differ but the $e(X)$ values are similar → matching is possible.

Procedure

Step 1: Estimate the PS

\hat{e}(X_i) = P(W_i = 1 \mid X_i)

Methods:

Logistic regression (common)
Random forest
Gradient boosting
Neural network

Step 2: Matching

For each treated individual, find the control with the most similar PS:

For each treated unit i:
    Find control j* with |e(Xi) - e(Xj)| minimized
    Match (i, j*)

Step 3: Effect Estimation

\hat{\text{ATT}} = \frac{1}{n_1} \sum_{i: W_i=1} (Y_i - Y_{j(i)})

Matching Options

1. Caliper

Distance constraint:

|e(X_i) - e(X_j)| < c

Typically $c = 0.2 \times \text{SD}(e(X))$ .

2. 1:k Matching

Match k controls per treated unit:

k=1: standard
k>1: reduced variance, possible increased bias

3. With/Without Replacement

Option	Advantages	Disadvantages
With	Good matches, lower bias	Higher variance, some controls overused
Without	Lower variance, fair	Lower match quality

Pros and Cons

Advantages

Advantage	Description
Dimensionality reduction	High-dimensional X → scalar PS
Intuitive	”Comparing similar probabilities”
Transparency	Matched pairs can be inspected
Flexibility	Various matching options

Disadvantages

Disadvantage	Description
Dependence on the PS model	Biased when the PS is misspecified
Information loss	Unmatched samples are excluded
Variance	Can be less efficient than IPW
Match quality	Poor matches are possible

Quality Assessment

Checking Balance

Compare covariate distributions after matching:

\text{SMD}_k = \frac{\bar{X}_{k,T} - \bar{X}_{k,C}}{\sqrt{(s^2_{k,T} + s^2_{k,C})/2}}

Criterion: $|\text{SMD}| < 0.1$

Visualization

import matplotlib.pyplot as plt

# PS distributions before and after matching
plt.subplot(1, 2, 1)
plt.hist(ps[W==1], alpha=0.5, label='Treated')
plt.hist(ps[W==0], alpha=0.5, label='Control')
plt.title('Before Matching')

plt.subplot(1, 2, 2)
plt.hist(ps_matched[W_matched==1], alpha=0.5, label='Treated')
plt.hist(ps_matched[W_matched==0], alpha=0.5, label='Control')
plt.title('After Matching')

Implementation

R (MatchIt)

library(MatchIt)

# PSM with 1:1 nearest neighbor
m.out <- matchit(treat ~ x1 + x2 + x3,
                 data = df,
                 method = "nearest",
                 distance = "logit",
                 caliper = 0.2)

# Check balance
summary(m.out)

# Matched data
matched_data <- match.data(m.out)

# Estimate ATT
lm(y ~ treat, data = matched_data, weights = weights)

Python

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Estimate the PS
lr = LogisticRegression()
lr.fit(X, W)
ps = lr.predict_proba(X)[:, 1]

# Matching
treated_idx = np.where(W == 1)[0]
control_idx = np.where(W == 0)[0]

nn = NearestNeighbors(n_neighbors=1)
nn.fit(ps[control_idx].reshape(-1, 1))
distances, matches = nn.kneighbors(ps[treated_idx].reshape(-1, 1))

# Estimate ATT
att = np.mean(Y[treated_idx] - Y[control_idx[matches.flatten()]])

Matching Methods Overview - consolidated overview of matching methods
Propensity Score - the core tool
Nearest Neighbor Matching - general NNM
ATT - the main estimand of PSM
Selection Bias - the problem being addressed

References

Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
yaoSurveyCausalInference2021 - Section 3.3
Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for PSM

Local graph