CATE (Conditional Average Treatment Effect)

Definition

The Conditional Average Treatment Effect (CATE) is the average treatment effect given covariates $X=x$ :

$\tau(x) = E[Y(1) - Y(0) | X = x]$

where:

$Y(1)$ : potential outcome under treatment
$Y(0)$ : potential outcome under no treatment
$X$ : pre-treatment covariates (feature variables)

Related terms:

HTE (Heterogeneous Treatment Effect): used synonymously with CATE
ITE (Individual Treatment Effect): $\tau_i = Y_i(1) - Y_i(0)$ (unobservable)

Intuitive Understanding

Key question:

“How effective is the treatment for a person with specific characteristics?”

ATE vs CATE:

Quantity	Definition	Question
ATE	$E[Y(1) - Y(0)]$	”Is there an effect on average?”
CATE	$E[Y(1) - Y(0) \\| X=x]$	”Is there an effect for a person with these characteristics?”

Example:

The average effect of a new drug is positive (ATE > 0)
But for patients aged 65 and over it has no or even a negative effect ( $\tau(x_{age \geq 65}) \leq 0$ )

ATE = E[τ(X)] = ∫ τ(x) dP(x)  (average of CATE)

Key Properties

Fundamental Problem of Causal Inference

At the individual level, $Y(1)$ and $Y(0)$ cannot be observed simultaneously:

Actual observation: $Y = AY(1) + (1-A)Y(0)$
The counterfactual is always missing

Identification Assumptions

The standard assumptions for identifying CATE:

SUTVA (Stable Unit Treatment Value Assumption)
- No interference: another unit’s treatment does not affect my outcome
- Consistency: $Y = Y(A)$
Unconfoundedness (Ignorability) $Y(0), Y(1) \perp A | X$
- Given $X$ , treatment assignment is independent of the potential outcomes
Positivity (Overlap) $0 < P(A=1|X=x) < 1, \quad \forall x \in \mathcal{X}$
- At every covariate value the probability of receiving treatment lies strictly between 0 and 1

Structure of CATE

CATE can be decomposed as: $\tau(x) = \mu_1(x) - \mu_0(x)$

where $\mu_a(x) = E[Y|X=x, A=a]$

Estimation Methods

Meta-Learners

Method	Description	Best When
S-Learner	Single model: $\hat{\mu}(x,a)$ , then $\hat{\tau}(x) = \hat{\mu}(x,1) - \hat{\mu}(x,0)$	Homogeneous effects
T-Learner	Two models: $\hat{\mu}_1(x)$ , $\hat{\mu}_0(x)$ separately	Different response functions
X-Learner	Two-stage imputation with propensity weighting	Unbalanced treatment groups
R-Learner	Residualize then regress: minimize $\sum(Y_i - \hat{\mu}(X_i) - (A_i - \hat{\pi}(X_i))\tau(X_i))^2$	Heterogeneous effects
DR-Learner	Regress doubly robust pseudo-outcome on $X$	Double robustness desired

Tree-Based Methods

Causal Forest (Wager & Athey): Random forest adapted for CATE
BART (Bayesian Additive Regression Trees)
Causal MARS

Deep Learning

CEVAE (Causal Effect VAE)
TARNet (Treatment-Agnostic Representation Network)
DragonNet

Example

Medical scenario:

$Y$ : reduction in blood pressure
$A$ : whether the new drug is administered (0/1)
$X$ : (age, sex, baseline blood pressure, BMI, …)

$\tau(x) = E[\text{BP reduction}|\text{new drug}] - E[\text{BP reduction}|\text{placebo}] \quad \text{given } X=x$

Interpretation:

$\tau(x) > 0$ : the new drug is effective for a patient with these characteristics
$\tau(x) < 0$ : the new drug is harmful for a patient with these characteristics
$\tau(x) \approx 0$ : no effect for a patient with these characteristics

Applications

Treatment Targeting (Policy Learning)

Learning the optimal treatment rule: $d^*(x) = \mathbf{1}[\tau(x) > 0]$

treat if $\tau(x) > 0$
don’t treat if $\tau(x) < 0$

Personalized Medicine

Tailored treatment based on patient characteristics
Minimize side effects & maximize efficacy

Precision Marketing

Estimating per-customer marketing effects
Personalized promotion targeting

Policy Evaluation

Analyzing policy effects by subgroup
Exploring heterogeneity

Evaluation Metrics

Evaluating CATE estimates is difficult (the true CATE is unobservable)

When an RCT is Available

PEHE (Precision in Estimation of HTE): $\sqrt{E[(\hat{\tau}(x) - \tau(x))^2]}$
ATE Error: $|\hat{\tau}_{ATE} - \tau_{ATE}|$

Observational Data

AUUC (Area Under Uplift Curve): treatment targeting performance
Qini Coefficient: uplift modeling evaluation

ATE - Average Treatment Effect (the average of CATE)
ATT - Average Treatment on Treated
Propensity Score - Treatment assignment probability
DR-Learner - A doubly robust method for CATE estimation
Double-Debiased ML - High-dimensional CATE estimation
Causal Forest - Tree-based CATE estimation

Key Papers

kunzelMetalearnersEstimatingHeterogeneous2019 - Meta-learners (S, T, X-learner)
nieQuasiOracleEstimationHeterogeneous2020 - R-learner
kennedyOptimalDoublyRobust2023 - DR-learner, optimal rates
Wager & Athey (2018) - Causal Forests
chernozhukovDoubleDebiasedMachine2018 - DML for treatment effects

Implementation

Python (econml):

from econml.dml import CausalForestDML
from econml.dr import DRLearner

# Causal Forest
cf = CausalForestDML()
cf.fit(Y, T, X=X, W=W)
cate = cf.effect(X_test)

# DR-Learner
dr = DRLearner()
dr.fit(Y, T, X=X, W=W)
cate = dr.effect(X_test)

R (grf):

library(grf)
cf <- causal_forest(X, Y, W)
tau_hat <- predict(cf)$predictions

References

kunzelMetalearnersEstimatingHeterogeneous2019
nieQuasiOracleEstimationHeterogeneous2020
kennedyOptimalDoublyRobust2023
chernozhukovDoubleDebiasedMachine2018
Wager & Athey (2018) - “Estimation and Inference of Heterogeneous Treatment Effects using Random Forests”

Local graph