T-Learner
Definition
The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.
Algorithm:
-
Estimate from the control group:
-
Estimate from the treatment group:
-
Estimate the CATE:
Intuitive Understanding
Key idea:
Fully separate the two groups and learn each response function independently.
Control data: (X₀, Y₀) → μ̂₀(x)
Treatment data: (X₁, Y₁) → μ̂₁(x)
↓
CATE: τ̂(x) = μ̂₁(x) - μ̂₀(x)
Advantages:
- Captures the distinct response structure of each group
- Well suited when and differ substantially
- Conceptually clear
Disadvantages:
- No data sharing (each model uses only half the data)
- Even when the CATE is simple, the rate depends on the complexity of the response function
- Inefficient when the group sizes are imbalanced
Key Properties
No Data Sharing
- Each model uses only the data of its own group
- Control: samples, Treatment: samples
- Cannot learn shared patterns
Rate Depends on Response Functions
- : smoothness of the response function
- Even when the CATE is simple (), the rate depends on
Minimax Optimal (Theorem 7)
Under certain conditions, the T-learner is minimax rate optimal.
Algorithm Detail
def t_learner(X, W, Y, base_learner):
# Split data by treatment
X_ctrl, Y_ctrl = X[W == 0], Y[W == 0]
X_treat, Y_treat = X[W == 1], Y[W == 1]
# Step 1: Fit control model
model_0 = base_learner.fit(X_ctrl, Y_ctrl)
# Step 2: Fit treatment model
model_1 = base_learner.fit(X_treat, Y_treat)
# Step 3: Predict CATE
def predict_cate(X_new):
return model_1.predict(X_new) - model_0.predict(X_new)
return predict_cate
When to Use
Good Scenarios
- When the response functions differ substantially: and have different structures
- When the group sizes are balanced: each model has enough data
- When the treatment effect is complex: model each group’s complexity separately
Bad Scenarios
- When the CATE is simple but the response is complex: the rate is unnecessarily slow
- When the group sizes are imbalanced: estimation for the smaller group is inaccurate
- When there are many shared patterns: loses the benefit of data sharing
Comparison with S-Learner
| Aspect | T-Learner | S-Learner |
|---|---|---|
| Models | 2 (separate) | 1 (combined) |
| Data per model | or | |
| Structure | Captures different responses | Assumes similar responses |
| Risk | No data sharing | May ignore treatment effect |
Example
Simulation setup:
- (complex)
- (complex, different pattern)
T-Learner:
- Captures each response function well ✓
- Good CATE estimation
S-Learner:
- Learns the average of the two patterns
- Misses the distinct pattern of each group
Variance Analysis
Variance of the CATE estimator:
The variance of each model contributes independently → each variance increases due to the data split.
Related Concepts
- Meta-learners - the overall framework
- S-Learner - alternative: a single model
- X-Learner - an improvement over the T-learner
- CATE - the estimation target
Implementation
Python (econml):
from econml.metalearners import TLearner
from sklearn.ensemble import RandomForestRegressor
t_learner = TLearner(models=RandomForestRegressor())
t_learner.fit(Y, T, X=X)
cate = t_learner.effect(X_test)
R:
library(causalToolbox)
t_rf <- T_RF(feat = X, tr = W, yobs = Y)
cate <- EstimateCate(t_rf, X_test)
References
- kunzelMetalearnersEstimatingHeterogeneous2019 - T-learner analysis and minimax optimality