S-Learner
Definition
The S-Learner (Single Learner) is a Meta-learner that estimates the response function with a single model that includes the treatment indicator as a feature, then computes the CATE.
Algorithm:
-
Estimate the combined response function with a single model:
-
Estimate the CATE:
Intuitive Understanding
Core idea:
Treat the treatment simply as one more feature, and train a single model on the entire dataset.
Data: (X, W, Y) for all observations
↓
Model: μ̂(x, w) = f(x, w) (single model)
↓
CATE: τ̂(x) = μ̂(x, 1) - μ̂(x, 0)
Advantages:
- The simplest approach
- Uses all data jointly (data sharing)
- Exploits common patterns shared between treatment/control
Disadvantages:
- May be ignored when the treatment effect is small (regularization drops )
- Unsuitable when the structures of and are very different
Key Properties
Data Sharing
- Trains a single model using all observations
- Can learn patterns common to control/treatment
Regularization Bias
- The stronger the regularization, the more it tends to ignore the influence of
- Suitable when CATE ≈ 0; bias arises otherwise
Convergence Rate
Depends on the smoothness of the response function:
Algorithm Detail
def s_learner(X, W, Y, base_learner):
# Step 1: Combine treatment as feature
X_combined = np.column_stack([X, W])
# Step 2: Fit single model
model = base_learner.fit(X_combined, Y)
# Step 3: Predict CATE
def predict_cate(X_new):
X_treat = np.column_stack([X_new, np.ones(len(X_new))])
X_ctrl = np.column_stack([X_new, np.zeros(len(X_new))])
return model.predict(X_treat) - model.predict(X_ctrl)
return predict_cate
When to Use
Good Scenarios
- When the CATE is mostly close to 0: Regularization works correctly
- When the response functions are similar:
- When data is limited: Benefits from data sharing
Bad Scenarios
- When the treatment effect is clear: The effect may be ignored
- When the response functions are very different: Hard to capture structural differences
- When heterogeneous effects matter: Subtle differences are missed
Comparison with T-Learner
| Aspect | S-Learner | T-Learner |
|---|---|---|
| Models | 1 | 2 |
| Data usage | All together | Split by treatment |
| Sharing | Yes | No |
| Best when | CATE ≈ 0 | Different response structures |
| Risk | Ignore small effects | No data sharing |
Example
Simulation setup:
- (i.e., )
S-Learner result:
- The regularized model ignores → ✓
- Correct estimation
Opposite scenario:
- , (i.e., )
- The S-Learner may shrink the influence of via regularization → bias
Related Concepts
- Meta-learners - The overall framework
- T-Learner - Alternative: two separate models
- X-Learner - Improvement over S/T-learner
- CATE - The estimation target
Implementation
Python (econml):
from econml.metalearners import SLearner
from sklearn.ensemble import RandomForestRegressor
s_learner = SLearner(overall_model=RandomForestRegressor())
s_learner.fit(Y, T, X=X)
cate = s_learner.effect(X_test)
R:
library(causalToolbox)
s_rf <- S_RF(feat = X, tr = W, yobs = Y)
cate <- EstimateCate(s_rf, X_test)
References
- kunzelMetalearnersEstimatingHeterogeneous2019 - S-learner analysis