DR-Learner · Tae Hyun Kim (Lowell)

Definition

The DR-Learner is a two-stage doubly robust estimator for CATE that regresses a Pseudo-outcome on the covariates.

Stage 1: Nuisance estimation

Propensity score: $\hat{\pi}(x) = P(A = 1 | X = x)$
Outcome regression: $\hat{\mu}_a(x) = E[Y | X = x, A = a]$ for $a \in \{0, 1\}$

Stage 2: Pseudo-outcome regression $\hat{\tau}_{DR}(x) = \hat{E}_n[\hat{\varphi}(Z) | X = x]$

where the pseudo-outcome is: $\hat{\varphi}(Z) = \underbrace{\hat{\mu}_1(X) - \hat{\mu}_0(X)}_{\text{plug-in}} + \underbrace{\frac{A - \hat{\pi}(X)}{\hat{\pi}(X)(1-\hat{\pi}(X))}(Y - A\hat{\mu}_1(X) - (1-A)\hat{\mu}_0(X))}_{\text{augmentation/correction}}$

Intuitive Understanding

Core idea:

Compute the doubly robust pseudo-outcome (the efficient influence function for the ATE)
Smooth/regress this pseudo-outcome on $X$
Separately exploit the structure of the CATE (smoothness, sparsity)

Stage 1:  Estimate π̂(x), μ̂₁(x), μ̂₀(x) using any ML method
              ↓
Stage 2:  Compute pseudo-outcome φ̂(Z) for each observation
              ↓
Stage 3:  Regress φ̂ on X to get τ̂(x)

Why “Doubly Robust”?

If either $\hat{\pi}$ or $\hat{\mu}$ is correct, the bias vanishes
Even if both are wrong, only the product of errors remains: $O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)$

Key Properties

Double Robustness

The bias term depends only on the product of the propensity-score and outcome-regression errors: $\text{Bias} = O(||\hat{\pi} - \pi_0|| \cdot ||\hat{\mu} - \mu_0||)$

Rate Adaptation

Adapts to the smoothness $\gamma$ of the CATE
Decoupled from the smoothness $\alpha, \beta$ of the individual nuisance functions
Can achieve a faster rate than the plug-in estimator

Oracle Efficiency

Achieves the oracle rate under the following condition: $\sqrt{\alpha\beta} \geq \frac{d/2}{\sqrt{1 + \frac{d}{\gamma}}\sqrt{1 + \frac{d}{2s}}}$

where:

$\alpha$ : propensity score smoothness
$\beta$ : outcome regression smoothness
$\gamma$ : CATE smoothness
$s$ : harmonic mean smoothness
$d$ : covariate dimension

Algorithm

# DR-Learner Algorithm
def dr_learner(X, A, Y, n_folds=5):
    # Stage 1: Cross-fitted nuisance estimation
    pi_hat = cross_fit_estimate(X, A, model='classifier')
    mu1_hat = cross_fit_estimate(X[A==1], Y[A==1], model='regressor')
    mu0_hat = cross_fit_estimate(X[A==0], Y[A==0], model='regressor')

    # Stage 2: Compute pseudo-outcomes
    phi_hat = (mu1_hat - mu0_hat) + \
              (A - pi_hat) / (pi_hat * (1 - pi_hat)) * \
              (Y - A * mu1_hat - (1 - A) * mu0_hat)

    # Stage 3: Regress pseudo-outcome on X
    tau_hat = regress(X, phi_hat, model='smoother')

    return tau_hat

Comparison with Other Learners

Method	Key Idea	Pros	Cons
T-Learner	Separate models per treatment	Simple	No sharing across groups
S-Learner	Single model with A as feature	Shares info	May miss heterogeneity
X-Learner	Two-stage imputation	Good for imbalance	Complex
R-Learner	Residualize then regress	Orthogonality	Requires product rate
DR-Learner	DR pseudo-outcome regression	Double robustness, rate adaptation	Stability condition needed

Theoretical Guarantee

Main Error Bound (Theorem 2): $\hat{\tau}_{DR}(x) - \tilde{\tau}(x) = \hat{E}_n[\hat{b}(X) | X = x] + o_P(\sqrt{R_n^*(x)})$

where:

$\tilde{\tau}(x)$ : the oracle estimator (using the true pseudo-outcome)
$\hat{b}(x)$ : bias from nuisance estimation
$R_n^*(x)$ : oracle variance

Stability Condition required: The second-stage regression estimator must be stable with respect to input perturbations.

Pseudo-outcome - the core component of the DR-Learner
Doubly Robust Estimator - theoretical foundation
CATE - the estimation target
Oracle Efficiency - the theoretical goal
Cross-fitting - prevents overfitting
R-Learner - related methodology

Comparison: DR-Learner vs R-Learner

Aspect	DR-Learner	R-Learner
Pseudo-outcome	$\hat{\mu}_1 - \hat{\mu}_0 + \text{correction}$	$(Y - \hat{\mu})(A - \hat{\pi})/\text{var}$
Rate condition	Product rate	Product rate
Oracle condition	$\sqrt{\alpha\beta} \geq \ldots$	Weaker for lp-R-Learner
Implementation	Simpler	More complex (lp version)

Applications

Medicine: Heterogeneous treatment effects in clinical trials
Policy: Subgroup-specific policy effects
Marketing: Personalized treatment response
Social Science: Causal effect heterogeneity

Implementation

Python (econml):

from econml.dr import DRLearner
dr = DRLearner(model_propensity=LogisticRegression(),
               model_regression=RandomForestRegressor(),
               model_final=RandomForestRegressor())
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)

R (grf):

library(grf)
# grf's causal_forest has similar doubly robust properties
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions

References

kennedyOptimalDoublyRobust2023 - DR-Learner theory and oracle efficiency
chernozhukovDoubleDebiasedMachine2018 - DML framework
nieQuasiOracleEstimationHeterogeneous2020 - R-Learner

Local graph