Tae Hyun Kim (Lowell)

Doubly Robust Estimator

Definition

The Doubly Robust (DR) Estimator combines an outcome-regression model and a propensity-score model, remaining consistent as long as just one of the two is correctly specified.

DR Estimator for the ATE: τ^DR=1ni=1nφ^(Zi)\hat{\tau}_{DR} = \frac{1}{n}\sum_{i=1}^n \hat{\varphi}(Z_i)

where the pseudo-outcome (efficient influence function) is: φ^(Z)=μ^1(X)μ^0(X)outcome regression+A(Yμ^1(X))π^(X)(1A)(Yμ^0(X))1π^(X)IPW augmentation\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{outcome regression}} + \underbrace{\frac{A(Y - \hat{\mu}_1(X))}{\hat{\pi}(X)} - \frac{(1-A)(Y - \hat{\mu}_0(X))}{1-\hat{\pi}(X)}}_{\text{IPW augmentation}}

or, in an equivalent form: φ^(Z)=μ^1(X)μ^0(X)+Aπ^(X)π^(X)(1π^(X))(YAμ^1(X)(1A)μ^0(X))\hat{\varphi}(Z) = \hat{\mu}_1(X) - \hat{\mu}_0(X) + \frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))

Intuitive Understanding

Combining three estimation strategies:

  1. Outcome Regression (OR): τ^OR=1ni[μ^1(Xi)μ^0(Xi)]\hat{\tau}_{OR} = \frac{1}{n}\sum_i [\hat{\mu}_1(X_i) - \hat{\mu}_0(X_i)]
  2. Inverse Propensity Weighting (IPW): τ^IPW=1niAiYiπ^(Xi)(1Ai)Yi1π^(Xi)\hat{\tau}_{IPW} = \frac{1}{n}\sum_i \frac{A_i Y_i}{\hat{\pi}(X_i)} - \frac{(1-A_i)Y_i}{1-\hat{\pi}(X_i)}
  3. Doubly Robust: OR + IPW correction term
OR only:        biased if μ̂ is wrong
IPW only:       biased if π̂ is wrong
DR:             consistent if either μ̂ OR π̂ is correct!

Why “Doubly Robust”?

  • μ^=μ0\hat{\mu} = \mu_0 (outcome model correct): the expectation of the augmentation term is 0
  • π^=π0\hat{\pi} = \pi_0 (propensity model correct): the weighting is exact, so the bias is offset
  • Even if both are wrong, the bias is proportional to the product of the errors: O(μ^μ0π^π0)O(||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0||)

Key Properties

Double Robustness Property

Theorem: If either of the following two conditions holds, τ^DR\hat{\tau}_{DR} is consistent:

  1. The outcome model is correctly specified: μ^a(x)pE[YX=x,A=a]\hat{\mu}_a(x) \xrightarrow{p} E[Y|X=x, A=a]
  2. The propensity model is correctly specified: π^(x)pP(A=1X=x)\hat{\pi}(x) \xrightarrow{p} P(A=1|X=x)

Semiparametric Efficiency

The DR estimator is semiparametrically efficient:

  • Constructed based on the efficient influence function
  • Attains the semiparametric efficiency bound
  • Has the lowest asymptotic variance

Var(τ^DR)=1nE[φ(Z;τ0,η0)2]+o(n1)\text{Var}(\hat{\tau}_{DR}) = \frac{1}{n}E[\varphi(Z; \tau_0, \eta_0)^2] + o(n^{-1})

Rate Doubly Robust

n\sqrt{n}-consistent under a product-rate condition: μ^μ0π^π0=oP(n1/2)||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0|| = o_P(n^{-1/2})

e.g., a rate of n1/4n^{-1/4} each is sufficient

Mathematical Derivation

Efficient Influence Function

The efficient influence function for the ATE τ=E[Y(1)Y(0)]\tau = E[Y(1) - Y(0)]: φ(Z;τ,η)=μ1(X)μ0(X)τ+A(Yμ1(X))π(X)(1A)(Yμ0(X))1π(X)\varphi(Z; \tau, \eta) = \mu_1(X) - \mu_0(X) - \tau + \frac{A(Y - \mu_1(X))}{\pi(X)} - \frac{(1-A)(Y - \mu_0(X))}{1 - \pi(X)}

Properties:

  • E[φ(Z;τ0,η0)]=0E[\varphi(Z; \tau_0, \eta_0)] = 0
  • E[φ(Z;τ0,η0)2]=E[\varphi(Z; \tau_0, \eta_0)^2] = semiparametric variance bound
  • Neyman orthogonal: ηE[φ]η0=0\partial_\eta E[\varphi]|_{\eta_0} = 0

Bias Analysis

E[τ^DR]τ0=E[(π^π0)(μ^1μ1)π^(π^π0)(μ^0μ0)1π^]E[\hat{\tau}_{DR}] - \tau_0 = E\left[\frac{(\hat{\pi} - \pi_0)(\hat{\mu}_1 - \mu_1)}{\hat{\pi}} - \frac{(\hat{\pi} - \pi_0)(\hat{\mu}_0 - \mu_0)}{1-\hat{\pi}}\right]

→ a product of errors form

Comparison: OR vs IPW vs DR

AspectOutcome RegressionIPWDoubly Robust
Model neededμa(x)\mu_a(x)π(x)\pi(x)Both
ConsistencyIf μ^\hat{\mu} correctIf π^\hat{\pi} correctIf either correct
EfficiencyNot efficientNot efficientSemiparametrically efficient
VarianceLow if μ^\hat{\mu} goodHigh with extreme π^\hat{\pi}Best of both
With MLRegularization biasVariance issuesRobust to both

Extensions

CATE Estimation

DR-Learner: regress the DR pseudo-outcome on XX τ^(x)=En[φ^(Z)X=x]\hat{\tau}(x) = E_n[\hat{\varphi}(Z) | X = x]

ATT Estimation

τ^ATT=1n1i:Ai=1[Yiμ^0(Xi)(1Ai)π^(Xi)Ai(1π^(Xi))(Yiμ^0(Xi))]\hat{\tau}_{ATT} = \frac{1}{n_1}\sum_{i: A_i=1}\left[Y_i - \hat{\mu}_0(X_i) - \frac{(1-A_i)\hat{\pi}(X_i)}{A_i(1-\hat{\pi}(X_i))}(Y_i - \hat{\mu}_0(X_i))\right]

Longitudinal Settings

Time-varying treatments, combined with g-computation

  • Pseudo-outcome - the core component of the DR estimator
  • DR-Learner - DR extension for CATE
  • Influence Function - the theoretical foundation of DR
  • Neyman-Orthogonal Score - the orthogonality property
  • Propensity Score - treatment assignment probability
  • Double-Debiased ML - related framework

Historical Context

  • Robins, Rotnitzky, Zhao (1994): first proposal of a doubly robust estimator
  • Bang & Robins (2005): coined the term “Doubly Robust Estimation”
  • Scharfstein, Rotnitzky, Robins (1999): connection to semiparametric theory
  • Chernozhukov et al. (2018): combination with ML (DML)

Implementation

Python (econml):

from econml.dr import LinearDRLearner
dr = LinearDRLearner()
dr.fit(Y, T, X=X, W=W)
ate = dr.ate(X)

R (AIPW package):

library(AIPW)
AIPW_SL <- AIPW$new(Y = Y, A = A, W = W,
                    Q.SL.library = c("SL.glm", "SL.ranger"),
                    g.SL.library = c("SL.glm", "SL.ranger"))
AIPW_SL$fit()
AIPW_SL$summary()

References

  • Robins, Rotnitzky, Zhao (1994) - Original DR estimator
  • kennedyOptimalDoublyRobust2023 - Optimal DR for CATE
  • chernozhukovDoubleDebiasedMachine2018 - DML framework
  • Bang & Robins (2005) - “Doubly Robust Estimation in Missing Data and Causal Inference Models”

Local graph