Tae Hyun Kim (Lowell)

DR-Learner

Definition

The DR-Learner is a two-stage doubly robust estimator for CATE that regresses a Pseudo-outcome on the covariates.

Stage 1: Nuisance estimation

  • Propensity score: π^(x)=P(A=1X=x)\hat{\pi}(x) = P(A = 1 | X = x)
  • Outcome regression: μ^a(x)=E[YX=x,A=a]\hat{\mu}_a(x) = E[Y | X = x, A = a] for a{0,1}a \in \{0, 1\}

Stage 2: Pseudo-outcome regression τ^DR(x)=E^n[φ^(Z)X=x]\hat{\tau}_{DR}(x) = \hat{E}_n[\hat{\varphi}(Z) | X = x]

where the pseudo-outcome is: φ^(Z)=μ^1(X)μ^0(X)plug-in+Aπ^(X)π^(X)(1π^(X))(YAμ^1(X)(1A)μ^0(X))augmentation/correction\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{plug-in}} + \underbrace{\frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))}_{\text{augmentation/correction}}

Intuitive Understanding

Core idea:

  1. Compute the doubly robust pseudo-outcome (the efficient influence function for the ATE)
  2. Smooth/regress this pseudo-outcome on XX
  3. Separately exploit the structure of the CATE (smoothness, sparsity)
Stage 1:  Estimate π̂(x), μ̂₁(x), μ̂₀(x) using any ML method

Stage 2:  Compute pseudo-outcome φ̂(Z) for each observation

Stage 3:  Regress φ̂ on X to get τ̂(x)

Why “Doubly Robust”?

  • If either π^\hat{\pi} or μ^\hat{\mu} is correct, the bias vanishes
  • Even if both are wrong, only the product of errors remains: O(π^π0μ^μ0)O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)

Key Properties

Double Robustness

The bias term depends only on the product of the propensity-score and outcome-regression errors: Bias=O(π^π0μ^μ0)\text{Bias} = O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)

Rate Adaptation

  • Adapts to the smoothness γ\gamma of the CATE
  • Decoupled from the smoothness α,β\alpha, \beta of the individual nuisance functions
  • Can achieve a faster rate than the plug-in estimator

Oracle Efficiency

Achieves the oracle rate under the following condition: αβd/21+dγ1+d2s\sqrt{\alpha\beta} \geq \frac{d/2}{\sqrt{1 + \frac{d}{\gamma}}\sqrt{1 + \frac{d}{2s}}}

where:

  • α\alpha: propensity score smoothness
  • β\beta: outcome regression smoothness
  • γ\gamma: CATE smoothness
  • ss: harmonic mean smoothness
  • dd: covariate dimension

Algorithm

# DR-Learner Algorithm
def dr_learner(X, A, Y, n_folds=5):
    # Stage 1: Cross-fitted nuisance estimation
    pi_hat = cross_fit_estimate(X, A, model='classifier')
    mu1_hat = cross_fit_estimate(X[A==1], Y[A==1], model='regressor')
    mu0_hat = cross_fit_estimate(X[A==0], Y[A==0], model='regressor')

    # Stage 2: Compute pseudo-outcomes
    phi_hat = (mu1_hat - mu0_hat) + \
              (A - pi_hat) / (pi_hat * (1 - pi_hat)) * \
              (Y - A * mu1_hat - (1 - A) * mu0_hat)

    # Stage 3: Regress pseudo-outcome on X
    tau_hat = regress(X, phi_hat, model='smoother')

    return tau_hat

Comparison with Other Learners

MethodKey IdeaProsCons
T-LearnerSeparate models per treatmentSimpleNo sharing across groups
S-LearnerSingle model with A as featureShares infoMay miss heterogeneity
X-LearnerTwo-stage imputationGood for imbalanceComplex
R-LearnerResidualize then regressOrthogonalityRequires product rate
DR-LearnerDR pseudo-outcome regressionDouble robustness, rate adaptationStability condition needed

Theoretical Guarantee

Main Error Bound (Theorem 2): τ^DR(x)τ~(x)=E^n[b^(X)X=x]+oP(Rn(x))\hat{\tau}_{DR}(x) - \tilde{\tau}(x) = \hat{E}_n[\hat{b}(X) | X = x] + o_P(\sqrt{R_n^*(x)})

where:

  • τ~(x)\tilde{\tau}(x): the oracle estimator (using the true pseudo-outcome)
  • b^(x)\hat{b}(x): bias from nuisance estimation
  • Rn(x)R_n^*(x): oracle variance

Stability Condition required: The second-stage regression estimator must be stable with respect to input perturbations.

  • Pseudo-outcome - the core component of the DR-Learner
  • Doubly Robust Estimator - theoretical foundation
  • CATE - the estimation target
  • Oracle Efficiency - the theoretical goal
  • Cross-fitting - prevents overfitting
  • R-Learner - related methodology

Comparison: DR-Learner vs R-Learner

AspectDR-LearnerR-Learner
Pseudo-outcomeμ^1μ^0+correction\hat{\mu}_1 - \hat{\mu}_0 + \text{correction}(Yμ^)(Aπ^)/var(Y - \hat{\mu})(A - \hat{\pi})/\text{var}
Rate conditionProduct rateProduct rate
Oracle conditionαβ\sqrt{\alpha\beta} \geq \ldotsWeaker for lp-R-Learner
ImplementationSimplerMore complex (lp version)

Applications

  • Medicine: Heterogeneous treatment effects in clinical trials
  • Policy: Subgroup-specific policy effects
  • Marketing: Personalized treatment response
  • Social Science: Causal effect heterogeneity

Implementation

Python (econml):

from econml.dr import DRLearner
dr = DRLearner(model_propensity=LogisticRegression(),
               model_regression=RandomForestRegressor(),
               model_final=RandomForestRegressor())
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)

R (grf):

library(grf)
# grf's causal_forest has similar doubly robust properties
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions

References

  • kennedyOptimalDoublyRobust2023 - DR-Learner theory and oracle efficiency
  • chernozhukovDoubleDebiasedMachine2018 - DML framework
  • nieQuasiOracleEstimationHeterogeneous2020 - R-Learner

Local graph