Tae Hyun Kim (Lowell)

T-Learner

3 min read #causal-inference#hte#meta-learner

Definition

The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.

Algorithm:

  1. Estimate μ0(x)\mu_0(x) from the control group: μ^0(x)=E^[YX=x,W=0]\hat{\mu}_0(x) = \hat{E}[Y | X = x, W = 0]

  2. Estimate μ1(x)\mu_1(x) from the treatment group: μ^1(x)=E^[YX=x,W=1]\hat{\mu}_1(x) = \hat{E}[Y | X = x, W = 1]

  3. Estimate the CATE: τ^T(x)=μ^1(x)μ^0(x)\hat{\tau}_T(x) = \hat{\mu}_1(x) - \hat{\mu}_0(x)

Intuitive Understanding

Key idea:

Fully separate the two groups and learn each response function independently.

Control data:  (X₀, Y₀) → μ̂₀(x)
Treatment data: (X₁, Y₁) → μ̂₁(x)

CATE:          τ̂(x) = μ̂₁(x) - μ̂₀(x)

Advantages:

  • Captures the distinct response structure of each group
  • Well suited when μ0\mu_0 and μ1\mu_1 differ substantially
  • Conceptually clear

Disadvantages:

  • No data sharing (each model uses only half the data)
  • Even when the CATE is simple, the rate depends on the complexity of the response function
  • Inefficient when the group sizes are imbalanced

Key Properties

No Data Sharing

  • Each model uses only the data of its own group
  • Control: mm samples, Treatment: nn samples
  • Cannot learn shared patterns

Rate Depends on Response Functions

Rate=O(maμ+naμ)\text{Rate} = O(m^{-a_\mu} + n^{-a_\mu})

  • aμa_\mu: smoothness of the response function
  • Even when the CATE is simple (aτ>aμa_\tau > a_\mu), the rate depends on aμa_\mu

Minimax Optimal (Theorem 7)

Under certain conditions, the T-learner is minimax rate optimal.

Algorithm Detail

def t_learner(X, W, Y, base_learner):
    # Split data by treatment
    X_ctrl, Y_ctrl = X[W == 0], Y[W == 0]
    X_treat, Y_treat = X[W == 1], Y[W == 1]

    # Step 1: Fit control model
    model_0 = base_learner.fit(X_ctrl, Y_ctrl)

    # Step 2: Fit treatment model
    model_1 = base_learner.fit(X_treat, Y_treat)

    # Step 3: Predict CATE
    def predict_cate(X_new):
        return model_1.predict(X_new) - model_0.predict(X_new)

    return predict_cate

When to Use

Good Scenarios

  • When the response functions differ substantially: μ0(x)\mu_0(x) and μ1(x)\mu_1(x) have different structures
  • When the group sizes are balanced: each model has enough data
  • When the treatment effect is complex: model each group’s complexity separately

Bad Scenarios

  • When the CATE is simple but the response is complex: the rate is unnecessarily slow
  • When the group sizes are imbalanced: estimation for the smaller group is inaccurate
  • When there are many shared patterns: loses the benefit of data sharing

Comparison with S-Learner

AspectT-LearnerS-Learner
Models2 (separate)1 (combined)
Data per modelmm or nnm+nm + n
StructureCaptures different responsesAssumes similar responses
RiskNo data sharingMay ignore treatment effect

Example

Simulation setup:

  • μ0(x)=sin(x)\mu_0(x) = \sin(x) (complex)
  • μ1(x)=cos(x)\mu_1(x) = \cos(x) (complex, different pattern)
  • τ(x)=cos(x)sin(x)\tau(x) = \cos(x) - \sin(x)

T-Learner:

  • Captures each response function well ✓
  • Good CATE estimation

S-Learner:

  • Learns the average of the two patterns
  • Misses the distinct pattern of each group

Variance Analysis

Variance of the CATE estimator: Var(τ^T(x))=Var(μ^1(x))+Var(μ^0(x))\text{Var}(\hat{\tau}_T(x)) = \text{Var}(\hat{\mu}_1(x)) + \text{Var}(\hat{\mu}_0(x))

The variance of each model contributes independently → each variance increases due to the data split.

Implementation

Python (econml):

from econml.metalearners import TLearner
from sklearn.ensemble import RandomForestRegressor

t_learner = TLearner(models=RandomForestRegressor())
t_learner.fit(Y, T, X=X)
cate = t_learner.effect(X_test)

R:

library(causalToolbox)
t_rf <- T_RF(feat = X, tr = W, yobs = Y)
cate <- EstimateCate(t_rf, X_test)

References

  • kunzelMetalearnersEstimatingHeterogeneous2019 - T-learner analysis and minimax optimality

Local graph