Uplift Modeling · Tae Hyun Kim (Lowell)

Definition

Uplift is the causal increment that a treatment (campaign exposure, coupon, recommendation) induces in an individual’s outcome (purchase, conversion). For binary treatment $W\in\{0,1\}$ , outcome $Y$ , and covariates $X$ ,

$\text{uplift}(x) = \mathbb{E}[Y\mid X=x, W=1] - \mathbb{E}[Y\mid X=x, W=0] = \tau(x)$

That is, uplift is the same as the CATE under binary treatment. It is the individual-level answer to “how much more does a person buy when exposed?”

Intuitive Understanding

A response model $P(Y=1\mid X, W=1)$ finds people who will buy, but an uplift model finds people who buy because of the exposure (persuadables). The four quadrants:

	Buys if exposed	Does not buy if exposed
Does not buy if unexposed	Persuadable (target ✓)	Lost cause
Buys if unexposed	Sure thing (wasteful)	Sleeping dog (backfires — do not touch)

The goal of targeting is to concentrate treatment on persuadables to raise ROI.

Estimation Methods

Meta-learners: S/T/X-learner, DR-Learner — estimate $\tau(x)$ with arbitrary ML
Causal Forest: tree-based direct uplift estimation (Wager & Athey 2018)
R-learner / DML: robust estimation via residual orthogonalization

Advantages and Disadvantages

Advantages: more efficient resource allocation than response models (focus on persuadables), detection of negative uplift (backfire effects).
Limitations: counterfactuals are unobservable → no labels (evaluation relies on OPE, Qini/uplift curves). With observational data it is vulnerable to Selection Bias and positivity violations.

Project Application

Dunnhumby: estimating segment-level uplift with CausalForestDML — found negative CATE (sleeping dogs) such as VIP Heavy −$38 and Bulk Shoppers −$40, the cause of a −$4,657 loss when targeting everyone. (project canonical)

Targeting Overview ← hub
CATE · HTE — uplift = binary-treatment CATE
Optimal Targeting Policy — converting uplift into a policy
Off-Policy Evaluation — evaluating the value of an uplift policy

References

MOC-Targeting