Tae Hyun Kim (Lowell)

Uplift Modeling

2 min read #targeting#uplift#meta-learner

Definition

Uplift is the causal increment that a treatment (campaign exposure, coupon, recommendation) induces in an individual’s outcome (purchase, conversion). For binary treatment W{0,1}W\in\{0,1\}, outcome YY, and covariates XX,

uplift(x)=E[YX=x,W=1]E[YX=x,W=0]=τ(x)\text{uplift}(x) = \mathbb{E}[Y\mid X=x, W=1] - \mathbb{E}[Y\mid X=x, W=0] = \tau(x)

That is, uplift is the same as the CATE under binary treatment. It is the individual-level answer to “how much more does a person buy when exposed?”

Intuitive Understanding

A response model P(Y=1X,W=1)P(Y=1\mid X, W=1) finds people who will buy, but an uplift model finds people who buy because of the exposure (persuadables). The four quadrants:

Buys if exposedDoes not buy if exposed
Does not buy if unexposedPersuadable (target ✓)Lost cause
Buys if unexposedSure thing (wasteful)Sleeping dog (backfires — do not touch)

The goal of targeting is to concentrate treatment on persuadables to raise ROI.

Estimation Methods

  • Meta-learners: S/T/X-learner, DR-Learner — estimate τ(x)\tau(x) with arbitrary ML
  • Causal Forest: tree-based direct uplift estimation (Wager & Athey 2018)
  • R-learner / DML: robust estimation via residual orthogonalization

Advantages and Disadvantages

  • Advantages: more efficient resource allocation than response models (focus on persuadables), detection of negative uplift (backfire effects).
  • Limitations: counterfactuals are unobservable → no labels (evaluation relies on OPE, Qini/uplift curves). With observational data it is vulnerable to Selection Bias and positivity violations.

Project Application

Dunnhumby: estimating segment-level uplift with CausalForestDML — found negative CATE (sleeping dogs) such as VIP Heavy −$38 and Bulk Shoppers −$40, the cause of a −$4,657 loss when targeting everyone. (project canonical)

References

  • MOC-Targeting

Local graph