Tae Hyun Kim (Lowell)

Propensity Score Matching (PSM)

3 min read #causal-inference#matching#psm

Definition

Matching treated and control individuals with similar propensity scores

j(i)=argminj:Wj=0e(Xi)e(Xj)j(i) = \arg\min_{j: W_j=0} |e(X_i) - e(X_j)|

Here, e(X)=P(W=1X)e(X) = P(W=1 \mid X) is the Propensity Score.


Intuitive Understanding

Why match on the PS?

Rosenbaum & Rubin (1983):

W ⁣ ⁣ ⁣Xe(X)W \perp\!\!\!\perp X \mid e(X)
  • If the PS is the same, the treatment probability is the same
  • → matching is possible on the scalar e(X)e(X) instead of the high-dimensional XX
  • Dimensionality reduction: mitigates the curse of dimensionality

Example

PatientAgeSexMedical historyPS
A (treated)45MHypertension0.72
B (control)52FDiabetes0.70

The XX values differ but the e(X)e(X) values are similar → matching is possible.


Procedure

Step 1: Estimate the PS

e^(Xi)=P(Wi=1Xi)\hat{e}(X_i) = P(W_i = 1 \mid X_i)

Methods:

  • Logistic regression (common)
  • Random forest
  • Gradient boosting
  • Neural network

Step 2: Matching

For each treated individual, find the control with the most similar PS:

For each treated unit i:
    Find control j* with |e(Xi) - e(Xj)| minimized
    Match (i, j*)

Step 3: Effect Estimation

ATT^=1n1i:Wi=1(YiYj(i))\hat{\text{ATT}} = \frac{1}{n_1} \sum_{i: W_i=1} (Y_i - Y_{j(i)})

Matching Options

1. Caliper

Distance constraint:

e(Xi)e(Xj)<c|e(X_i) - e(X_j)| < c

Typically c=0.2×SD(e(X))c = 0.2 \times \text{SD}(e(X)).

2. 1:k Matching

Match k controls per treated unit:

  • k=1: standard
  • k>1: reduced variance, possible increased bias

3. With/Without Replacement

OptionAdvantagesDisadvantages
WithGood matches, lower biasHigher variance, some controls overused
WithoutLower variance, fairLower match quality

Pros and Cons

Advantages

AdvantageDescription
Dimensionality reductionHigh-dimensional X → scalar PS
Intuitive”Comparing similar probabilities”
TransparencyMatched pairs can be inspected
FlexibilityVarious matching options

Disadvantages

DisadvantageDescription
Dependence on the PS modelBiased when the PS is misspecified
Information lossUnmatched samples are excluded
VarianceCan be less efficient than IPW
Match qualityPoor matches are possible

Quality Assessment

Checking Balance

Compare covariate distributions after matching:

SMDk=Xˉk,TXˉk,C(sk,T2+sk,C2)/2\text{SMD}_k = \frac{\bar{X}_{k,T} - \bar{X}_{k,C}}{\sqrt{(s^2_{k,T} + s^2_{k,C})/2}}

Criterion: SMD<0.1|\text{SMD}| < 0.1

Visualization

import matplotlib.pyplot as plt

# PS distributions before and after matching
plt.subplot(1, 2, 1)
plt.hist(ps[W==1], alpha=0.5, label='Treated')
plt.hist(ps[W==0], alpha=0.5, label='Control')
plt.title('Before Matching')

plt.subplot(1, 2, 2)
plt.hist(ps_matched[W_matched==1], alpha=0.5, label='Treated')
plt.hist(ps_matched[W_matched==0], alpha=0.5, label='Control')
plt.title('After Matching')

Implementation

R (MatchIt)

library(MatchIt)

# PSM with 1:1 nearest neighbor
m.out <- matchit(treat ~ x1 + x2 + x3,
                 data = df,
                 method = "nearest",
                 distance = "logit",
                 caliper = 0.2)

# Check balance
summary(m.out)

# Matched data
matched_data <- match.data(m.out)

# Estimate ATT
lm(y ~ treat, data = matched_data, weights = weights)

Python

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import NearestNeighbors

# Estimate the PS
lr = LogisticRegression()
lr.fit(X, W)
ps = lr.predict_proba(X)[:, 1]

# Matching
treated_idx = np.where(W == 1)[0]
control_idx = np.where(W == 0)[0]

nn = NearestNeighbors(n_neighbors=1)
nn.fit(ps[control_idx].reshape(-1, 1))
distances, matches = nn.kneighbors(ps[treated_idx].reshape(-1, 1))

# Estimate ATT
att = np.mean(Y[treated_idx] - Y[control_idx[matches.flatten()]])

  • Matching Methods Overview - consolidated overview of matching methods
  • Propensity Score - the core tool
  • Nearest Neighbor Matching - general NNM
  • ATT - the main estimand of PSM
  • Selection Bias - the problem being addressed

References

  • Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
  • yaoSurveyCausalInference2021 - Section 3.3
  • Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for PSM

Local graph