Tae Hyun Kim (Lowell)

Doubly Robust Estimator

Definition

Doubly Robust (DR) Estimator는 outcome regressionpropensity score 모델을 결합하여, 둘 중 하나만 올바르게 specified되어도 consistent한 추정량.

ATE에 대한 DR Estimator: τ^DR=1ni=1nφ^(Zi)\hat{\tau}_{DR} = \frac{1}{n}\sum_{i=1}^n \hat{\varphi}(Z_i)

여기서 pseudo-outcome (efficient influence function): φ^(Z)=μ^1(X)μ^0(X)outcome regression+A(Yμ^1(X))π^(X)(1A)(Yμ^0(X))1π^(X)IPW augmentation\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{outcome regression}} + \underbrace{\frac{A(Y - \hat{\mu}_1(X))}{\hat{\pi}(X)} - \frac{(1-A)(Y - \hat{\mu}_0(X))}{1-\hat{\pi}(X)}}_{\text{IPW augmentation}}

또는 equivalent form: φ^(Z)=μ^1(X)μ^0(X)+Aπ^(X)π^(X)(1π^(X))(YAμ^1(X)(1A)μ^0(X))\hat{\varphi}(Z) = \hat{\mu}_1(X) - \hat{\mu}_0(X) + \frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))

Intuitive Understanding

세 가지 추정 전략의 결합:

  1. Outcome Regression (OR): τ^OR=1ni[μ^1(Xi)μ^0(Xi)]\hat{\tau}_{OR} = \frac{1}{n}\sum_i [\hat{\mu}_1(X_i) - \hat{\mu}_0(X_i)]
  2. Inverse Propensity Weighting (IPW): τ^IPW=1niAiYiπ^(Xi)(1Ai)Yi1π^(Xi)\hat{\tau}_{IPW} = \frac{1}{n}\sum_i \frac{A_i Y_i}{\hat{\pi}(X_i)} - \frac{(1-A_i)Y_i}{1-\hat{\pi}(X_i)}
  3. Doubly Robust: OR + IPW correction term
OR만 사용:      μ̂ 틀리면 biased
IPW만 사용:     π̂ 틀리면 biased
DR 사용:        μ̂ OR π̂ 중 하나만 맞아도 consistent!

왜 “Doubly Robust”인가?

  • μ^=μ0\hat{\mu} = \mu_0 (outcome model correct): augmentation term의 기댓값이 0
  • π^=π0\hat{\pi} = \pi_0 (propensity model correct): weighting이 정확하여 bias 상쇄
  • 둘 다 틀려도 bias가 오차의 곱에 비례: O(μ^μ0π^π0)O(||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0||)

Key Properties

Double Robustness Property

Theorem: 다음 두 조건 중 하나가 성립하면 τ^DR\hat{\tau}_{DR}는 consistent:

  1. Outcome model이 correctly specified: μ^a(x)pE[YX=x,A=a]\hat{\mu}_a(x) \xrightarrow{p} E[Y|X=x, A=a]
  2. Propensity model이 correctly specified: π^(x)pP(A=1X=x)\hat{\pi}(x) \xrightarrow{p} P(A=1|X=x)

Semiparametric Efficiency

DR estimator는 semiparametrically efficient:

  • Efficient influence function을 기반으로 구성
  • Semiparametric efficiency bound 달성
  • 가장 낮은 asymptotic variance

Var(τ^DR)=1nE[φ(Z;τ0,η0)2]+o(n1)\text{Var}(\hat{\tau}_{DR}) = \frac{1}{n}E[\varphi(Z; \tau_0, \eta_0)^2] + o(n^{-1})

Rate Doubly Robust

Product rate condition 하에서 n\sqrt{n}-consistent: μ^μ0π^π0=oP(n1/2)||\hat{\mu} - \mu_0|| \cdot ||\hat{\pi} - \pi_0|| = o_P(n^{-1/2})

예: 각각 n1/4n^{-1/4} rate면 충분

Mathematical Derivation

Efficient Influence Function

ATE τ=E[Y(1)Y(0)]\tau = E[Y(1) - Y(0)]의 efficient influence function: φ(Z;τ,η)=μ1(X)μ0(X)τ+A(Yμ1(X))π(X)(1A)(Yμ0(X))1π(X)\varphi(Z; \tau, \eta) = \mu_1(X) - \mu_0(X) - \tau + \frac{A(Y - \mu_1(X))}{\pi(X)} - \frac{(1-A)(Y - \mu_0(X))}{1 - \pi(X)}

Properties:

  • E[φ(Z;τ0,η0)]=0E[\varphi(Z; \tau_0, \eta_0)] = 0
  • E[φ(Z;τ0,η0)2]=E[\varphi(Z; \tau_0, \eta_0)^2] = semiparametric variance bound
  • Neyman orthogonal: ηE[φ]η0=0\partial_\eta E[\varphi]|_{\eta_0} = 0

Bias Analysis

E[τ^DR]τ0=E[(π^π0)(μ^1μ1)π^(π^π0)(μ^0μ0)1π^]E[\hat{\tau}_{DR}] - \tau_0 = E\left[\frac{(\hat{\pi} - \pi_0)(\hat{\mu}_1 - \mu_1)}{\hat{\pi}} - \frac{(\hat{\pi} - \pi_0)(\hat{\mu}_0 - \mu_0)}{1-\hat{\pi}}\right]

Product of errors 형태

Comparison: OR vs IPW vs DR

AspectOutcome RegressionIPWDoubly Robust
Model neededμa(x)\mu_a(x)π(x)\pi(x)Both
ConsistencyIf μ^\hat{\mu} correctIf π^\hat{\pi} correctIf either correct
EfficiencyNot efficientNot efficientSemiparametrically efficient
VarianceLow if μ^\hat{\mu} goodHigh with extreme π^\hat{\pi}Best of both
With MLRegularization biasVariance issuesRobust to both

Extensions

CATE Estimation

DR-Learner: DR pseudo-outcome을 XX에 대해 regression τ^(x)=En[φ^(Z)X=x]\hat{\tau}(x) = E_n[\hat{\varphi}(Z) | X = x]

ATT Estimation

τ^ATT=1n1i:Ai=1[Yiμ^0(Xi)(1Ai)π^(Xi)Ai(1π^(Xi))(Yiμ^0(Xi))]\hat{\tau}_{ATT} = \frac{1}{n_1}\sum_{i: A_i=1}\left[Y_i - \hat{\mu}_0(X_i) - \frac{(1-A_i)\hat{\pi}(X_i)}{A_i(1-\hat{\pi}(X_i))}(Y_i - \hat{\mu}_0(X_i))\right]

Longitudinal Settings

Time-varying treatments, g-computation과 결합

  • Pseudo-outcome - DR estimator의 핵심 구성요소
  • DR-Learner - CATE를 위한 DR 확장
  • Influence Function - DR의 이론적 기반
  • Neyman-Orthogonal Score - Orthogonality 속성
  • Propensity Score - Treatment assignment probability
  • Double-Debiased ML - 관련 framework

Historical Context

  • Robins, Rotnitzky, Zhao (1994): 최초 doubly robust estimator 제안
  • Bang & Robins (2005): “Doubly Robust Estimation” 명명
  • Scharfstein, Rotnitzky, Robins (1999): Semiparametric theory 연결
  • Chernozhukov et al. (2018): ML과의 결합 (DML)

Implementation

Python (econml):

from econml.dr import LinearDRLearner
dr = LinearDRLearner()
dr.fit(Y, T, X=X, W=W)
ate = dr.ate(X)

R (AIPW package):

library(AIPW)
AIPW_SL <- AIPW$new(Y = Y, A = A, W = W,
                    Q.SL.library = c("SL.glm", "SL.ranger"),
                    g.SL.library = c("SL.glm", "SL.ranger"))
AIPW_SL$fit()
AIPW_SL$summary()

References

  • Robins, Rotnitzky, Zhao (1994) - Original DR estimator
  • kennedyOptimalDoublyRobust2023 - Optimal DR for CATE
  • chernozhukovDoubleDebiasedMachine2018 - DML framework
  • Bang & Robins (2005) - “Doubly Robust Estimation in Missing Data and Causal Inference Models”

연결 그래프