Meta-learners
Definition
Meta-learners는 기존 supervised learning 방법 (base learner)을 활용하여 CATE를 추정하는 알고리즘의 총칭.
핵심 아이디어:
CATE 추정 문제를 base learner가 해결할 수 있는 sub-regression 문제들로 분해
주요 Meta-learners:
- S-Learner: Single model with treatment as feature
- T-Learner: Two separate models
- X-Learner: Two-stage imputation approach
- R-Learner: Residualized regression
- DR-Learner: Doubly robust pseudo-outcome regression
Intuitive Understanding
왜 “Meta”인가?
- Base learner (RF, BART, NN 등)는 추정에 설계됨
- CATE 는 직접 추정 불가
- Meta-learner는 base learner를 재활용하여 CATE 추정
Base Learner: Designed for E[Y|X] (standard regression)
↓
Meta-Learner: Transforms CATE problem → sub-regression problems
↓
Uses base learners to solve sub-problems
↓
Combines results → τ̂(x)
Comparison of Meta-learners
| Method | Approach | Pros | Cons | Best When |
|---|---|---|---|---|
| S-Learner | Simple, shares data | May ignore small effects | CATE ≈ 0 | |
| T-Learner | Captures different response functions | No data sharing | structures | |
| X-Learner | Two-stage imputation + weighting | Exploits CATE structure, handles imbalance | More complex | Unbalanced groups, smooth CATE |
| R-Learner | Residualized regression | Orthogonality | Requires product rate | Heterogeneous effects |
| DR-Learner | DR pseudo-outcome regression | Double robustness | Stability condition | Robustness desired |
Framework and Setup
Potential Outcomes Framework
여기서:
- : Covariates
- : Treatment indicator
- : Propensity Score
- : Response functions
Identification Assumptions
- Unconfoundedness:
- Positivity:
Estimation Target
Convergence Rates
Notation: = function class with minimax rate
T-Learner Rate
- : control group size, : treatment group size
- : smoothness of response functions
X-Learner Rate (under conditions)
- :
- :
Linear CATE + Lipschitz response → Parametric rate 달성 가능
Choosing a Meta-learner
Decision Guide
Mermaid source (click to expand)
> flowchart TD
> A[Start] --> B{CATE mostly zero?}
> B -->|Yes| C[S-Learner]
> B -->|No| D{Groups balanced?}
> D -->|No| E[X-Learner]
> D -->|Yes| F{Response functions similar?}
> F -->|Yes| G[X-Learner or R-Learner]
> F -->|No| H[T-Learner]
>
Practical Recommendations (Künzel et al.)
- Default choice: X-Learner (unless strong prior that CATE ≈ 0)
- Small datasets: BART as base learner
- Large datasets: Random Forest as base learner
Base Learners
Meta-learners는 다양한 base learner와 호환:
| Base Learner | Strengths | Typical Use |
|---|---|---|
| Random Forest | Scalable, handles high-dim | Large datasets |
| BART | Uncertainty quantification, regularization | Small datasets |
| Neural Networks | Flexible, complex patterns | Very large datasets |
| Lasso/Ridge | Sparse/regularized linear | High-dim, interpretable |
| Boosting | Adaptive, accurate | General purpose |
Related Concepts
- CATE - 추정 대상
- S-Learner - Single model approach
- T-Learner - Two model approach
- X-Learner - Imputation-based approach
- R-Learner - Residualized approach
- DR-Learner - Doubly robust approach
- Propensity Score - Treatment probability
Implementation
Python (econml):
from econml.metalearners import SLearner, TLearner, XLearner
from sklearn.ensemble import RandomForestRegressor
# S-Learner
s_learner = SLearner(overall_model=RandomForestRegressor())
s_learner.fit(Y, T, X=X)
# T-Learner
t_learner = TLearner(models=RandomForestRegressor())
t_learner.fit(Y, T, X=X)
# X-Learner
x_learner = XLearner(models=RandomForestRegressor())
x_learner.fit(Y, T, X=X)
cate = x_learner.effect(X_test)
R (causalToolbox):
library(causalToolbox)
# S-Learner with RF
s_rf <- S_RF(feat = X, tr = W, yobs = Y)
cate_s <- EstimateCate(s_rf, X_test)
# X-Learner with RF
x_rf <- X_RF(feat = X, tr = W, yobs = Y)
cate_x <- EstimateCate(x_rf, X_test)
References
- kunzelMetalearnersEstimatingHeterogeneous2019 - S, T, X-learner
- nieQuasiOracleEstimationHeterogeneous2020 - R-learner
- kennedyOptimalDoublyRobust2023 - DR-learner