DR-Learner · Tae Hyun Kim (Lowell)

Definition

DR-Learner는 CATE 추정을 위한 2단계 doubly robust estimator로, Pseudo-outcome을 covariate에 대해 regression하는 방식.

Stage 1: Nuisance 추정

Propensity score: $\hat{\pi}(x) = P(A = 1 | X = x)$
Outcome regression: $\hat{\mu}_a(x) = E[Y | X = x, A = a]$ for $a \in \{0, 1\}$

Stage 2: Pseudo-outcome regression $\hat{\tau}_{DR}(x) = \hat{E}_n[\hat{\varphi}(Z) | X = x]$

여기서 pseudo-outcome: $\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{plug-in}} + \underbrace{\frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))}_{\text{augmentation/correction}}$

Intuitive Understanding

핵심 아이디어:

Doubly robust pseudo-outcome을 계산 (ATE의 efficient influence function)
이 pseudo-outcome을 $X$ 에 대해 smoothing/regression
CATE 구조 (smoothness, sparsity)를 별도로 활용

Stage 1:  Estimate π̂(x), μ̂₁(x), μ̂₀(x) using any ML method
              ↓
Stage 2:  Compute pseudo-outcome φ̂(Z) for each observation
              ↓
Stage 3:  Regress φ̂ on X to get τ̂(x)

왜 “Doubly Robust”인가?

$\hat{\pi}$ 또는 $\hat{\mu}$ 중 하나가 정확하면 bias가 사라짐
둘 다 틀려도 product of errors만 남음: $O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)$

Key Properties

Double Robustness

Bias term이 propensity score와 outcome regression 오차의 곱에만 의존: $\text{Bias} = O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)$

Rate Adaptation

CATE의 smoothness $\gamma$ 에 적응
개별 nuisance function의 smoothness $\alpha, \beta$ 와 분리
Plug-in estimator보다 빠른 rate 달성 가능

Oracle Efficiency

다음 조건 하에서 oracle rate 달성: $\sqrt{\alpha\beta} \geq \frac{d/2}{\sqrt{1 + \frac{d}{\gamma}}\sqrt{1 + \frac{d}{2s}}}$

여기서:

$\alpha$ : propensity score smoothness
$\beta$ : outcome regression smoothness
$\gamma$ : CATE smoothness
$s$ : harmonic mean smoothness
$d$ : covariate dimension

Algorithm

# DR-Learner Algorithm
def dr_learner(X, A, Y, n_folds=5):
    # Stage 1: Cross-fitted nuisance estimation
    pi_hat = cross_fit_estimate(X, A, model='classifier')
    mu1_hat = cross_fit_estimate(X[A==1], Y[A==1], model='regressor')
    mu0_hat = cross_fit_estimate(X[A==0], Y[A==0], model='regressor')

    # Stage 2: Compute pseudo-outcomes
    phi_hat = (mu1_hat - mu0_hat) + \
              (A - pi_hat) / (pi_hat * (1 - pi_hat)) * \
              (Y - A * mu1_hat - (1 - A) * mu0_hat)

    # Stage 3: Regress pseudo-outcome on X
    tau_hat = regress(X, phi_hat, model='smoother')

    return tau_hat

Comparison with Other Learners

Method	Key Idea	Pros	Cons
T-Learner	Separate models per treatment	Simple	No sharing across groups
S-Learner	Single model with A as feature	Shares info	May miss heterogeneity
X-Learner	Two-stage imputation	Good for imbalance	Complex
R-Learner	Residualize then regress	Orthogonality	Requires product rate
DR-Learner	DR pseudo-outcome regression	Double robustness, rate adaptation	Stability condition needed

Theoretical Guarantee

Main Error Bound (Theorem 2): $\hat{\tau}_{DR}(x) - \tilde{\tau}(x) = \hat{E}_n[\hat{b}(X) | X = x] + o_P(\sqrt{R_n^*(x)})$

여기서:

$\tilde{\tau}(x)$ : oracle estimator (true pseudo-outcome 사용)
$\hat{b}(x)$ : bias from nuisance estimation
$R_n^*(x)$ : oracle variance

Stability Condition 필요: Second-stage regression estimator가 input perturbation에 안정적이어야 함.

Pseudo-outcome - DR-Learner의 핵심 구성요소
Doubly Robust Estimator - 이론적 기반
CATE - 추정 대상
Oracle Efficiency - 이론적 목표
Cross-fitting - Overfitting 방지
R-Learner - 관련 방법론

Comparison: DR-Learner vs R-Learner

Aspect	DR-Learner	R-Learner
Pseudo-outcome	$\hat{\mu}_1 - \hat{\mu}_0 + \text{correction}$	$(Y - \hat{\mu})(A - \hat{\pi})/\text{var}$
Rate condition	Product rate	Product rate
Oracle condition	$\sqrt{\alpha\beta} \geq \ldots$	Weaker for lp-R-Learner
Implementation	Simpler	More complex (lp version)

Applications

Medicine: Heterogeneous treatment effects in clinical trials
Policy: Subgroup-specific policy effects
Marketing: Personalized treatment response
Social Science: Causal effect heterogeneity

Implementation

Python (econml):

from econml.dr import DRLearner
dr = DRLearner(model_propensity=LogisticRegression(),
               model_regression=RandomForestRegressor(),
               model_final=RandomForestRegressor())
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)

R (grf):

library(grf)
# grf의 causal_forest가 유사한 doubly robust 속성 가짐
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions

References

kennedyOptimalDoublyRobust2023 - DR-Learner 이론 및 oracle efficiency
chernozhukovDoubleDebiasedMachine2018 - DML framework
nieQuasiOracleEstimationHeterogeneous2020 - R-Learner

연결 그래프