Tae Hyun Kim (Lowell)

CATE (Conditional Average Treatment Effect)

4분 읽기 #causal-inference#cate#hte

Definition

**Conditional Average Treatment Effect (CATE)**는 covariate X=xX=x가 주어졌을 때의 평균 처치 효과:

τ(x)=E[Y(1)Y(0)X=x]\tau(x) = E[Y(1) - Y(0) | X = x]

여기서:

  • Y(1)Y(1): Treatment를 받았을 때의 potential outcome
  • Y(0)Y(0): Treatment를 받지 않았을 때의 potential outcome
  • XX: Pre-treatment covariates (특성 변수)

관련 용어:

  • HTE (Heterogeneous Treatment Effect): CATE와 동의어로 사용
  • ITE (Individual Treatment Effect): τi=Yi(1)Yi(0)\tau_i = Y_i(1) - Y_i(0) (관측 불가)

Intuitive Understanding

핵심 질문:

“특정 특성을 가진 사람에게 처치가 얼마나 효과적인가?”

ATE vs CATE:

QuantityDefinitionQuestion
ATEE[Y(1)Y(0)]E[Y(1) - Y(0)]”평균적으로 효과가 있는가?”
CATEE[Y(1)Y(0)X=x]E[Y(1) - Y(0) \| X=x]”이 특성을 가진 사람에게 효과가 있는가?”

예시:

  • 신약의 평균 효과는 양수지만 (ATE > 0)
  • 65세 이상 환자에게는 효과가 없거나 부정적 (τ(xage65)0\tau(x_{age \geq 65}) \leq 0)
ATE = E[τ(X)] = ∫ τ(x) dP(x)  (CATE의 평균)

Key Properties

Fundamental Problem of Causal Inference

개인 수준에서 Y(1)Y(1)Y(0)Y(0)를 동시에 관측할 수 없음:

  • 실제 관측: Y=AY(1)+(1A)Y(0)Y = AY(1) + (1-A)Y(0)
  • 반사실은 항상 missing

Identification Assumptions

CATE 식별을 위한 표준 가정들:

  1. SUTVA (Stable Unit Treatment Value Assumption)

    • No interference: 타인의 treatment가 나의 outcome에 영향 없음
    • Consistency: Y=Y(A)Y = Y(A)
  2. Unconfoundedness (Ignorability) Y(0),Y(1)AXY(0), Y(1) \perp A | X

    • XX가 주어지면 treatment assignment가 potential outcomes와 독립
  3. Positivity (Overlap) 0<P(A=1X=x)<1,xX0 < P(A=1|X=x) < 1, \quad \forall x \in \mathcal{X}

    • 모든 covariate 값에서 treatment 받을 확률이 0과 1 사이

CATE의 구조

CATE는 다음으로 분해 가능: τ(x)=μ1(x)μ0(x)\tau(x) = \mu_1(x) - \mu_0(x)

여기서 μa(x)=E[YX=x,A=a]\mu_a(x) = E[Y|X=x, A=a]

Estimation Methods

Meta-Learners

MethodDescriptionBest When
S-LearnerSingle model: μ^(x,a)\hat{\mu}(x,a), then τ^(x)=μ^(x,1)μ^(x,0)\hat{\tau}(x) = \hat{\mu}(x,1) - \hat{\mu}(x,0)Homogeneous effects
T-LearnerTwo models: μ^1(x)\hat{\mu}_1(x), μ^0(x)\hat{\mu}_0(x) separatelyDifferent response functions
X-LearnerTwo-stage imputation with propensity weightingUnbalanced treatment groups
R-LearnerResidualize then regress: minimize (Yiμ^(Xi)(Aiπ^(Xi))τ(Xi))2\sum(Y_i - \hat{\mu}(X_i) - (A_i - \hat{\pi}(X_i))\tau(X_i))^2Heterogeneous effects
DR-LearnerRegress doubly robust pseudo-outcome on XXDouble robustness desired

Tree-Based Methods

  • Causal Forest (Wager & Athey): Random forest adapted for CATE
  • BART (Bayesian Additive Regression Trees)
  • Causal MARS

Deep Learning

  • CEVAE (Causal Effect VAE)
  • TARNet (Treatment-Agnostic Representation Network)
  • DragonNet

Example

의료 시나리오:

  • YY: 혈압 감소량
  • AA: 신약 투여 여부 (0/1)
  • XX: (나이, 성별, 기저 혈압, BMI, …)

τ(x)=E[혈압 감소신약]E[혈압 감소위약]given X=x\tau(x) = E[\text{혈압 감소}|\text{신약}] - E[\text{혈압 감소}|\text{위약}] \quad \text{given } X=x

해석:

  • τ(x)>0\tau(x) > 0: 이 특성의 환자에게 신약이 효과적
  • τ(x)<0\tau(x) < 0: 이 특성의 환자에게 신약이 해로움
  • τ(x)0\tau(x) \approx 0: 이 특성의 환자에게 효과 없음

Applications

Treatment Targeting (Policy Learning)

최적 treatment rule 학습: d(x)=1[τ(x)>0]d^*(x) = \mathbf{1}[\tau(x) > 0]

  • τ(x)>0\tau(x) > 0이면 treat
  • τ(x)<0\tau(x) < 0이면 don’t treat

Personalized Medicine

  • 환자 특성에 따른 맞춤 치료
  • 부작용 최소화 & 효과 최대화

Precision Marketing

  • 고객별 마케팅 효과 추정
  • 개인화된 프로모션 targeting

Policy Evaluation

  • Subgroup별 정책 효과 분석
  • Heterogeneity 탐색

Evaluation Metrics

CATE 추정의 평가는 어려움 (true CATE 관측 불가)

RCT가 있는 경우

  • PEHE (Precision in Estimation of HTE): E[(τ^(x)τ(x))2]\sqrt{E[(\hat{\tau}(x) - \tau(x))^2]}
  • ATE Error: τ^ATEτATE|\hat{\tau}_{ATE} - \tau_{ATE}|

Observational Data

  • AUUC (Area Under Uplift Curve): Treatment targeting 성능
  • Qini Coefficient: Uplift modeling 평가
  • ATE - Average Treatment Effect (CATE의 평균)
  • ATT - Average Treatment on Treated
  • Propensity Score - Treatment assignment probability
  • DR-Learner - CATE 추정을 위한 doubly robust 방법
  • Double-Debiased ML - High-dimensional CATE 추정
  • Causal Forest - Tree-based CATE 추정

Key Papers

  • kunzelMetalearnersEstimatingHeterogeneous2019 - Meta-learners (S, T, X-learner)
  • nieQuasiOracleEstimationHeterogeneous2020 - R-learner
  • kennedyOptimalDoublyRobust2023 - DR-learner, optimal rates
  • Wager & Athey (2018) - Causal Forests
  • chernozhukovDoubleDebiasedMachine2018 - DML for treatment effects

Implementation

Python (econml):

from econml.dml import CausalForestDML
from econml.dr import DRLearner

# Causal Forest
cf = CausalForestDML()
cf.fit(Y, T, X=X, W=W)
cate = cf.effect(X_test)

# DR-Learner
dr = DRLearner()
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)

R (grf):

library(grf)
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions

References

  • kunzelMetalearnersEstimatingHeterogeneous2019
  • nieQuasiOracleEstimationHeterogeneous2020
  • kennedyOptimalDoublyRobust2023
  • chernozhukovDoubleDebiasedMachine2018
  • Wager & Athey (2018) - “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests”

연결 그래프