S-Learner · Tae Hyun Kim (Lowell)

Definition

S-Learner (Single Learner)는 treatment indicator를 feature로 포함하는 단일 모델로 response function을 추정한 후 CATE를 계산하는 Meta-learners.

Algorithm:

단일 모델로 combined response function 추정: $\hat{\mu}(x, w) = \hat{E}[Y | X = x, W = w]$
CATE 추정: $\hat{\tau}_S(x) = \hat{\mu}(x, 1) - \hat{\mu}(x, 0)$

Intuitive Understanding

핵심 아이디어:

Treatment $W$ 를 단순히 또 하나의 feature로 취급하고, 하나의 모델로 전체 데이터를 학습

Data:  (X, W, Y) for all observations
           ↓
Model: μ̂(x, w) = f(x, w)  (single model)
           ↓
CATE:  τ̂(x) = μ̂(x, 1) - μ̂(x, 0)

장점:

가장 간단한 접근
모든 데이터를 함께 사용 (data sharing)
Treatment/control 간 공통 패턴 활용

단점:

Treatment effect가 작을 때 무시할 수 있음 (regularization이 W를 drop)
$\mu_0$ 와 $\mu_1$ 의 구조가 매우 다를 때 부적합

Key Properties

전체 데이터 $(n + m)$ 개를 사용하여 하나의 모델 학습
Control/treatment 공통 패턴 학습 가능

Regularization Bias

$\hat{\mu}(x, w) \approx \hat{\mu}(x) \quad \text{if treatment effect is small}$

Regularization이 강할수록 $W$ 의 영향을 무시하는 경향
CATE ≈ 0일 때 적합, 아닐 때 bias 발생

Convergence Rate

Response function의 smoothness $a_\mu$ 에 의존: $\text{Rate} = O((n+m)^{-a_\mu})$

Algorithm Detail

def s_learner(X, W, Y, base_learner):
    # Step 1: Combine treatment as feature
    X_combined = np.column_stack([X, W])

    # Step 2: Fit single model
    model = base_learner.fit(X_combined, Y)

    # Step 3: Predict CATE
    def predict_cate(X_new):
        X_treat = np.column_stack([X_new, np.ones(len(X_new))])
        X_ctrl = np.column_stack([X_new, np.zeros(len(X_new))])
        return model.predict(X_treat) - model.predict(X_ctrl)

    return predict_cate

When to Use

Good Scenarios

CATE가 대부분 0에 가까울 때: Regularization이 올바르게 작동
Response function이 유사할 때: $\mu_0(x) \approx \mu_1(x) + c$
데이터가 제한적일 때: Data sharing의 이점

Bad Scenarios

Treatment effect가 명확할 때: Effect를 무시할 수 있음
Response function이 매우 다를 때: 구조적 차이 포착 어려움
Heterogeneous effect가 중요할 때: 미묘한 차이 놓침

Comparison with T-Learner

Aspect	S-Learner	T-Learner
Models	1	2
Data usage	All together	Split by treatment
Sharing	Yes	No
Best when	CATE ≈ 0	Different response structures
Risk	Ignore small effects	No data sharing

Example

시뮬레이션 설정:

$\mu_0(x) = x$
$\mu_1(x) = x$ (즉, $\tau(x) = 0$ )

S-Learner 결과:

Regularized model이 $W$ 를 무시 → $\hat{\tau}(x) \approx 0$ ✓
올바른 추정

반대 시나리오:

$\mu_0(x) = x$ , $\mu_1(x) = x + 2$ (즉, $\tau(x) = 2$ )
S-Learner가 regularization으로 $W$ 의 영향 축소 가능 → bias

Meta-learners - 전체 framework
T-Learner - 대안: 두 개의 별도 모델
X-Learner - S/T-learner의 개선
CATE - 추정 대상

Implementation

Python (econml):

from econml.metalearners import SLearner
from sklearn.ensemble import RandomForestRegressor

s_learner = SLearner(overall_model=RandomForestRegressor())
s_learner.fit(Y, T, X=X)
cate = s_learner.effect(X_test)

library(causalToolbox)
s_rf <- S_RF(feat = X, tr = W, yobs = Y)
cate <- EstimateCate(s_rf, X_test)

References

kunzelMetalearnersEstimatingHeterogeneous2019 - S-learner 분석

연결 그래프