T-Learner · Tae Hyun Kim (Lowell)

Definition

The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.

Algorithm:

Estimate $\mu_0(x)$ from the control group: $\hat{\mu}_0(x) = \hat{E}[Y | X = x, W = 0]$
Estimate $\mu_1(x)$ from the treatment group: $\hat{\mu}_1(x) = \hat{E}[Y | X = x, W = 1]$
Estimate the CATE: $\hat{\tau}_T(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)$

Intuitive Understanding

Key idea:

Fully separate the two groups and learn each response function independently.

Control data:  (X₀, Y₀) → μ̂₀(x)
Treatment data: (X₁, Y₁) → μ̂₁(x)
                    ↓
CATE:          τ̂(x) = μ̂₁(x) - μ̂₀(x)

Advantages:

Captures the distinct response structure of each group
Well suited when $\mu_0$ and $\mu_1$ differ substantially
Conceptually clear

Disadvantages:

No data sharing (each model uses only half the data)
Even when the CATE is simple, the rate depends on the complexity of the response function
Inefficient when the group sizes are imbalanced

Key Properties

Each model uses only the data of its own group
Control: $m$ samples, Treatment: $n$ samples
Cannot learn shared patterns

Rate Depends on Response Functions

$\text{Rate} = O(m^{-a_\mu} + n^{-a_\mu})$

$a_\mu$ : smoothness of the response function
Even when the CATE is simple ( $a_\tau > a_\mu$ ), the rate depends on $a_\mu$

Minimax Optimal (Theorem 7)

Under certain conditions, the T-learner is minimax rate optimal.

Algorithm Detail

def t_learner(X, W, Y, base_learner):
    # Split data by treatment
    X_ctrl, Y_ctrl = X[W == 0], Y[W == 0]
    X_treat, Y_treat = X[W == 1], Y[W == 1]

    # Step 1: Fit control model
    model_0 = base_learner.fit(X_ctrl, Y_ctrl)

    # Step 2: Fit treatment model
    model_1 = base_learner.fit(X_treat, Y_treat)

    # Step 3: Predict CATE
    def predict_cate(X_new):
        return model_1.predict(X_new) - model_0.predict(X_new)

    return predict_cate

When to Use

Good Scenarios

When the response functions differ substantially: $\mu_0(x)$ and $\mu_1(x)$ have different structures
When the group sizes are balanced: each model has enough data
When the treatment effect is complex: model each group’s complexity separately

Bad Scenarios

When the CATE is simple but the response is complex: the rate is unnecessarily slow
When the group sizes are imbalanced: estimation for the smaller group is inaccurate
When there are many shared patterns: loses the benefit of data sharing

Comparison with S-Learner

Aspect	T-Learner	S-Learner
Models	2 (separate)	1 (combined)
Data per model	$m$ or $n$	$m + n$
Structure	Captures different responses	Assumes similar responses
Risk	No data sharing	May ignore treatment effect

Example

Simulation setup:

$\mu_0(x) = \sin(x)$ (complex)
$\mu_1(x) = \cos(x)$ (complex, different pattern)
$\tau(x) = \cos(x) - \sin(x)$

T-Learner:

Captures each response function well ✓
Good CATE estimation

S-Learner:

Learns the average of the two patterns
Misses the distinct pattern of each group

Variance Analysis

Variance of the CATE estimator: $\text{Var}(\hat{\tau}_T(x)) = \text{Var}(\hat{\mu}_1(x)) + \text{Var}(\hat{\mu}_0(x))$

The variance of each model contributes independently → each variance increases due to the data split.

Meta-learners - the overall framework
S-Learner - alternative: a single model
X-Learner - an improvement over the T-learner
CATE - the estimation target

Implementation

Python (econml):

from econml.metalearners import TLearner
from sklearn.ensemble import RandomForestRegressor

t_learner = TLearner(models=RandomForestRegressor())
t_learner.fit(Y, T, X=X)
cate = t_learner.effect(X_test)

library(causalToolbox)
t_rf <- T_RF(feat = X, tr = W, yobs = Y)
cate <- EstimateCate(t_rf, X_test)

References

kunzelMetalearnersEstimatingHeterogeneous2019 - T-learner analysis and minimax optimality

Local graph