Tae Hyun Kim (Lowell)

Meta-learners

Definition

Meta-learners는 기존 supervised learning 방법 (base learner)을 활용하여 CATE를 추정하는 알고리즘의 총칭.

핵심 아이디어:

CATE 추정 문제를 base learner가 해결할 수 있는 sub-regression 문제들로 분해

τ(x)=E[Y(1)Y(0)X=x]\tau(x) = E[Y(1) - Y(0) | X = x]

주요 Meta-learners:

Intuitive Understanding

왜 “Meta”인가?

  • Base learner (RF, BART, NN 등)는 E[YX]E[Y|X] 추정에 설계됨
  • CATE τ(x)=E[Y(1)Y(0)X]\tau(x) = E[Y(1) - Y(0)|X]는 직접 추정 불가
  • Meta-learner는 base learner를 재활용하여 CATE 추정
Base Learner:    Designed for E[Y|X] (standard regression)

Meta-Learner:    Transforms CATE problem → sub-regression problems

                 Uses base learners to solve sub-problems

                 Combines results → τ̂(x)

Comparison of Meta-learners

MethodApproachProsConsBest When
S-Learnerμ^(x,1)μ^(x,0)\hat{\mu}(x,1) - \hat{\mu}(x,0)Simple, shares dataMay ignore small effectsCATE ≈ 0
T-Learnerμ^1(x)μ^0(x)\hat{\mu}_1(x) - \hat{\mu}_0(x)Captures different response functionsNo data sharingμ0μ1\mu_0 \neq \mu_1 structures
X-LearnerTwo-stage imputation + weightingExploits CATE structure, handles imbalanceMore complexUnbalanced groups, smooth CATE
R-LearnerResidualized regressionOrthogonalityRequires product rateHeterogeneous effects
DR-LearnerDR pseudo-outcome regressionDouble robustnessStability conditionRobustness desired

Framework and Setup

Potential Outcomes Framework

XΛ,WBern(e(X))X \sim \Lambda, \quad W \sim \text{Bern}(e(X)) Y(0)=μ0(X)+ϵ(0),Y(1)=μ1(X)+ϵ(1)Y(0) = \mu_0(X) + \epsilon(0), \quad Y(1) = \mu_1(X) + \epsilon(1)

여기서:

  • XRdX \in \mathbb{R}^d: Covariates
  • W{0,1}W \in \{0, 1\}: Treatment indicator
  • e(x)=P(W=1X=x)e(x) = P(W=1|X=x): Propensity Score
  • μa(x)=E[Y(a)X=x]\mu_a(x) = E[Y(a)|X=x]: Response functions

Identification Assumptions

  1. Unconfoundedness: (Y(0),Y(1))WX(Y(0), Y(1)) \perp W | X
  2. Positivity: 0<emin<e(x)<emax<10 < e_{min} < e(x) < e_{max} < 1

Estimation Target

EMSE(P,τ^)=E[(τ(X)τ^(X))2]\text{EMSE}(P, \hat{\tau}) = E[(\tau(X) - \hat{\tau}(X))^2]

Convergence Rates

Notation: S(a)S(a) = function class with minimax rate NaN^{-a}

T-Learner Rate

Rate=O(maμ+naμ)\text{Rate} = O(m^{-a_\mu} + n^{-a_\mu})

  • mm: control group size, nn: treatment group size
  • aμa_\mu: smoothness of response functions

X-Learner Rate (under conditions)

  • τ^0\hat{\tau}_0: O(maτ+naμ)O(m^{-a_\tau} + n^{-a_\mu})
  • τ^1\hat{\tau}_1: O(maμ+naτ)O(m^{-a_\mu} + n^{-a_\tau})

Linear CATE + Lipschitz response → Parametric rate 달성 가능

Choosing a Meta-learner

Decision Guide

Meta-learners

Mermaid source (click to expand)
> flowchart TD
>     A[Start] --> B{CATE mostly zero?}
>     B -->|Yes| C[S-Learner]
>     B -->|No| D{Groups balanced?}
>     D -->|No| E[X-Learner]
>     D -->|Yes| F{Response functions similar?}
>     F -->|Yes| G[X-Learner or R-Learner]
>     F -->|No| H[T-Learner]
>

Practical Recommendations (Künzel et al.)

  1. Default choice: X-Learner (unless strong prior that CATE ≈ 0)
  2. Small datasets: BART as base learner
  3. Large datasets: Random Forest as base learner

Base Learners

Meta-learners는 다양한 base learner와 호환:

Base LearnerStrengthsTypical Use
Random ForestScalable, handles high-dimLarge datasets
BARTUncertainty quantification, regularizationSmall datasets
Neural NetworksFlexible, complex patternsVery large datasets
Lasso/RidgeSparse/regularized linearHigh-dim, interpretable
BoostingAdaptive, accurateGeneral purpose

Implementation

Python (econml):

from econml.metalearners import SLearner, TLearner, XLearner
from sklearn.ensemble import RandomForestRegressor

# S-Learner
s_learner = SLearner(overall_model=RandomForestRegressor())
s_learner.fit(Y, T, X=X)

# T-Learner
t_learner = TLearner(models=RandomForestRegressor())
t_learner.fit(Y, T, X=X)

# X-Learner
x_learner = XLearner(models=RandomForestRegressor())
x_learner.fit(Y, T, X=X)

cate = x_learner.effect(X_test)

R (causalToolbox):

library(causalToolbox)

# S-Learner with RF
s_rf <- S_RF(feat = X, tr = W, yobs = Y)
cate_s <- EstimateCate(s_rf, X_test)

# X-Learner with RF
x_rf <- X_RF(feat = X, tr = W, yobs = Y)
cate_x <- EstimateCate(x_rf, X_test)

References

  • kunzelMetalearnersEstimatingHeterogeneous2019 - S, T, X-learner
  • nieQuasiOracleEstimationHeterogeneous2020 - R-learner
  • kennedyOptimalDoublyRobust2023 - DR-learner

연결 그래프