CATE (Conditional Average Treatment Effect)
Definition
The Conditional Average Treatment Effect (CATE) is the average treatment effect given covariates :
where:
- : potential outcome under treatment
- : potential outcome under no treatment
- : pre-treatment covariates (feature variables)
Related terms:
- HTE (Heterogeneous Treatment Effect): used synonymously with CATE
- ITE (Individual Treatment Effect): (unobservable)
Intuitive Understanding
Key question:
“How effective is the treatment for a person with specific characteristics?”
ATE vs CATE:
| Quantity | Definition | Question |
|---|---|---|
| ATE | ”Is there an effect on average?” | |
| CATE | ”Is there an effect for a person with these characteristics?” |
Example:
- The average effect of a new drug is positive (ATE > 0)
- But for patients aged 65 and over it has no or even a negative effect ()
ATE = E[τ(X)] = ∫ τ(x) dP(x) (average of CATE)
Key Properties
Fundamental Problem of Causal Inference
At the individual level, and cannot be observed simultaneously:
- Actual observation:
- The counterfactual is always missing
Identification Assumptions
The standard assumptions for identifying CATE:
-
SUTVA (Stable Unit Treatment Value Assumption)
- No interference: another unit’s treatment does not affect my outcome
- Consistency:
-
Unconfoundedness (Ignorability)
- Given , treatment assignment is independent of the potential outcomes
-
Positivity (Overlap)
- At every covariate value the probability of receiving treatment lies strictly between 0 and 1
Structure of CATE
CATE can be decomposed as:
where
Estimation Methods
Meta-Learners
| Method | Description | Best When |
|---|---|---|
| S-Learner | Single model: , then | Homogeneous effects |
| T-Learner | Two models: , separately | Different response functions |
| X-Learner | Two-stage imputation with propensity weighting | Unbalanced treatment groups |
| R-Learner | Residualize then regress: minimize | Heterogeneous effects |
| DR-Learner | Regress doubly robust pseudo-outcome on | Double robustness desired |
Tree-Based Methods
- Causal Forest (Wager & Athey): Random forest adapted for CATE
- BART (Bayesian Additive Regression Trees)
- Causal MARS
Deep Learning
- CEVAE (Causal Effect VAE)
- TARNet (Treatment-Agnostic Representation Network)
- DragonNet
Example
Medical scenario:
- : reduction in blood pressure
- : whether the new drug is administered (0/1)
- : (age, sex, baseline blood pressure, BMI, …)
Interpretation:
- : the new drug is effective for a patient with these characteristics
- : the new drug is harmful for a patient with these characteristics
- : no effect for a patient with these characteristics
Applications
Treatment Targeting (Policy Learning)
Learning the optimal treatment rule:
- treat if
- don’t treat if
Personalized Medicine
- Tailored treatment based on patient characteristics
- Minimize side effects & maximize efficacy
Precision Marketing
- Estimating per-customer marketing effects
- Personalized promotion targeting
Policy Evaluation
- Analyzing policy effects by subgroup
- Exploring heterogeneity
Evaluation Metrics
Evaluating CATE estimates is difficult (the true CATE is unobservable)
When an RCT is Available
- PEHE (Precision in Estimation of HTE):
- ATE Error:
Observational Data
- AUUC (Area Under Uplift Curve): treatment targeting performance
- Qini Coefficient: uplift modeling evaluation
Related Concepts
- ATE - Average Treatment Effect (the average of CATE)
- ATT - Average Treatment on Treated
- Propensity Score - Treatment assignment probability
- DR-Learner - A doubly robust method for CATE estimation
- Double-Debiased ML - High-dimensional CATE estimation
- Causal Forest - Tree-based CATE estimation
Key Papers
- kunzelMetalearnersEstimatingHeterogeneous2019 - Meta-learners (S, T, X-learner)
- nieQuasiOracleEstimationHeterogeneous2020 - R-learner
- kennedyOptimalDoublyRobust2023 - DR-learner, optimal rates
- Wager & Athey (2018) - Causal Forests
- chernozhukovDoubleDebiasedMachine2018 - DML for treatment effects
Implementation
Python (econml):
from econml.dml import CausalForestDML
from econml.dr import DRLearner
# Causal Forest
cf = CausalForestDML()
cf.fit(Y, T, X=X, W=W)
cate = cf.effect(X_test)
# DR-Learner
dr = DRLearner()
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)
R (grf):
library(grf)
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions
References
- kunzelMetalearnersEstimatingHeterogeneous2019
- nieQuasiOracleEstimationHeterogeneous2020
- kennedyOptimalDoublyRobust2023
- chernozhukovDoubleDebiasedMachine2018
- Wager & Athey (2018) - “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests”