Propensity Score Matching (PSM)
Definition
Matching treated and control individuals with similar propensity scores
Here, is the Propensity Score.
Intuitive Understanding
Why match on the PS?
Rosenbaum & Rubin (1983):
- If the PS is the same, the treatment probability is the same
- → matching is possible on the scalar instead of the high-dimensional
- Dimensionality reduction: mitigates the curse of dimensionality
Example
| Patient | Age | Sex | Medical history | PS |
|---|---|---|---|---|
| A (treated) | 45 | M | Hypertension | 0.72 |
| B (control) | 52 | F | Diabetes | 0.70 |
The values differ but the values are similar → matching is possible.
Procedure
Step 1: Estimate the PS
Methods:
- Logistic regression (common)
- Random forest
- Gradient boosting
- Neural network
Step 2: Matching
For each treated individual, find the control with the most similar PS:
For each treated unit i:
Find control j* with |e(Xi) - e(Xj)| minimized
Match (i, j*)
Step 3: Effect Estimation
Matching Options
1. Caliper
Distance constraint:
Typically .
2. 1:k Matching
Match k controls per treated unit:
- k=1: standard
- k>1: reduced variance, possible increased bias
3. With/Without Replacement
| Option | Advantages | Disadvantages |
|---|---|---|
| With | Good matches, lower bias | Higher variance, some controls overused |
| Without | Lower variance, fair | Lower match quality |
Pros and Cons
Advantages
| Advantage | Description |
|---|---|
| Dimensionality reduction | High-dimensional X → scalar PS |
| Intuitive | ”Comparing similar probabilities” |
| Transparency | Matched pairs can be inspected |
| Flexibility | Various matching options |
Disadvantages
| Disadvantage | Description |
|---|---|
| Dependence on the PS model | Biased when the PS is misspecified |
| Information loss | Unmatched samples are excluded |
| Variance | Can be less efficient than IPW |
| Match quality | Poor matches are possible |
Quality Assessment
Checking Balance
Compare covariate distributions after matching:
Criterion:
Visualization
import matplotlib.pyplot as plt
# PS distributions before and after matching
plt.subplot(1, 2, 1)
plt.hist(ps[W==1], alpha=0.5, label='Treated')
plt.hist(ps[W==0], alpha=0.5, label='Control')
plt.title('Before Matching')
plt.subplot(1, 2, 2)
plt.hist(ps_matched[W_matched==1], alpha=0.5, label='Treated')
plt.hist(ps_matched[W_matched==0], alpha=0.5, label='Control')
plt.title('After Matching')
Implementation
R (MatchIt)
library(MatchIt)
# PSM with 1:1 nearest neighbor
m.out <- matchit(treat ~ x1 + x2 + x3,
data = df,
method = "nearest",
distance = "logit",
caliper = 0.2)
# Check balance
summary(m.out)
# Matched data
matched_data <- match.data(m.out)
# Estimate ATT
lm(y ~ treat, data = matched_data, weights = weights)
Python
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors
# Estimate the PS
lr = LogisticRegression()
lr.fit(X, W)
ps = lr.predict_proba(X)[:, 1]
# Matching
treated_idx = np.where(W == 1)[0]
control_idx = np.where(W == 0)[0]
nn = NearestNeighbors(n_neighbors=1)
nn.fit(ps[control_idx].reshape(-1, 1))
distances, matches = nn.kneighbors(ps[treated_idx].reshape(-1, 1))
# Estimate ATT
att = np.mean(Y[treated_idx] - Y[control_idx[matches.flatten()]])
Related Concepts
- Matching Methods Overview - consolidated overview of matching methods
- Propensity Score - the core tool
- Nearest Neighbor Matching - general NNM
- ATT - the main estimand of PSM
- Selection Bias - the problem being addressed
References
- Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
- yaoSurveyCausalInference2021 - Section 3.3
- Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for PSM