Doubly Robust Estimator

Definition

The Doubly Robust (DR) Estimator combines an outcome-regression model and a propensity-score model, remaining consistent as long as just one of the two is correctly specified.

DR Estimator for the ATE: $\hat{\tau}_{DR} = \frac{1}{n}\sum_{i=1}^n \hat{\varphi}(Z_i)$

where the pseudo-outcome (efficient influence function) is: $\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{outcome regression}} + \underbrace{\frac{A(Y - \hat{\mu}_1(X))}{\hat{\pi}(X)} - \frac{(1-A)(Y - \hat{\mu}_0(X))}{1-\hat{\pi}(X)}}_{\text{IPW augmentation}}$

or, in an equivalent form: $\hat{\varphi}(Z) = \hat{\mu}_1(X) - \hat{\mu}_0(X) + \frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))$

Intuitive Understanding

Combining three estimation strategies:

Outcome Regression (OR): $\hat{\tau}_{OR} = \frac{1}{n}\sum_i [\hat{\mu}_1(X_i) - \hat{\mu}_0(X_i)]$
Inverse Propensity Weighting (IPW): $\hat{\tau}_{IPW} = \frac{1}{n}\sum_i \frac{A_i Y_i}{\hat{\pi}(X_i)} - \frac{(1-A_i)Y_i}{1-\hat{\pi}(X_i)}$
Doubly Robust: OR + IPW correction term

OR only:        biased if μ̂ is wrong
IPW only:       biased if π̂ is wrong
DR:             consistent if either μ̂ OR π̂ is correct!

Why “Doubly Robust”?

$\hat{\mu} = \mu_0$ (outcome model correct): the expectation of the augmentation term is 0
$\hat{\pi} = \pi_0$ (propensity model correct): the weighting is exact, so the bias is offset
Even if both are wrong, the bias is proportional to the product of the errors: $O(||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0||)$

Key Properties

Double Robustness Property

Theorem: If either of the following two conditions holds, $\hat{\tau}_{DR}$ is consistent:

The outcome model is correctly specified: $\hat{\mu}_a(x) \xrightarrow{p} E[Y|X=x, A=a]$
The propensity model is correctly specified: $\hat{\pi}(x) \xrightarrow{p} P(A=1|X=x)$

Semiparametric Efficiency

The DR estimator is semiparametrically efficient:

Constructed based on the efficient influence function
Attains the semiparametric efficiency bound
Has the lowest asymptotic variance

$\text{Var}(\hat{\tau}_{DR}) = \frac{1}{n}E[\varphi(Z; \tau_0, \eta_0)^2] + o(n^{-1})$

Rate Doubly Robust

$\sqrt{n}$ -consistent under a product-rate condition: $||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0|| = o_P(n^{-1/2})$

e.g., a rate of $n^{-1/4}$ each is sufficient

Mathematical Derivation

Efficient Influence Function

The efficient influence function for the ATE $\tau = E[Y(1) - Y(0)]$ : $\varphi(Z; \tau, \eta) = \mu_1(X) - \mu_0(X) - \tau + \frac{A(Y - \mu_1(X))}{\pi(X)} - \frac{(1-A)(Y - \mu_0(X))}{1 - \pi(X)}$

Properties:

$E[\varphi(Z; \tau_0, \eta_0)] = 0$
$E[\varphi(Z; \tau_0, \eta_0)^2] =$ semiparametric variance bound
Neyman orthogonal: $\partial_\eta E[\varphi]|_{\eta_0} = 0$

Bias Analysis

$E[\hat{\tau}_{DR}] - \tau_0 = E\left[\frac{(\hat{\pi} - \pi_0)(\hat{\mu}_1 - \mu_1)}{\hat{\pi}} - \frac{(\hat{\pi} - \pi_0)(\hat{\mu}_0 - \mu_0)}{1-\hat{\pi}}\right]$

→ a product of errors form

Comparison: OR vs IPW vs DR

Aspect	Outcome Regression	IPW	Doubly Robust
Model needed	$\mu_a(x)$	$\pi(x)$	Both
Consistency	If $\hat{\mu}$ correct	If $\hat{\pi}$ correct	If either correct
Efficiency	Not efficient	Not efficient	Semiparametrically efficient
Variance	Low if $\hat{\mu}$ good	High with extreme $\hat{\pi}$	Best of both
With ML	Regularization bias	Variance issues	Robust to both

Pseudo-outcome - the core component of the DR estimator
DR-Learner - DR extension for CATE
Influence Function - the theoretical foundation of DR
Neyman-Orthogonal Score - the orthogonality property
Propensity Score - treatment assignment probability
Double-Debiased ML - related framework

Historical Context

Robins, Rotnitzky, Zhao (1994): first proposal of a doubly robust estimator
Bang & Robins (2005): coined the term “Doubly Robust Estimation”
Scharfstein, Rotnitzky, Robins (1999): connection to semiparametric theory
Chernozhukov et al. (2018): combination with ML (DML)

Implementation

Python (econml):

from econml.dr import LinearDRLearner
dr = LinearDRLearner()
dr.fit(Y, T, X=X, W=W)
ate = dr.ate(X)

R (AIPW package):

library(AIPW)
AIPW_SL <- AIPW$new(Y = Y, A = A, W = W,
                    Q.SL.library = c("SL.glm", "SL.ranger"),
                    g.SL.library = c("SL.glm", "SL.ranger"))
AIPW_SL$fit()
AIPW_SL$summary()

References

Robins, Rotnitzky, Zhao (1994) - Original DR estimator
kennedyOptimalDoublyRobust2023 - Optimal DR for CATE
chernozhukovDoubleDebiasedMachine2018 - DML framework
Bang & Robins (2005) - “Doubly Robust Estimation in Missing Data and Causal Inference Models”