Tae Hyun Kim (Lowell)

Optimal Targeting Policy

Definition

An Optimal Targeting Policy is a rule that maps covariates xx to a treatment decision π(x){0,1}\pi(x)\in\{0,1\} so as to maximize policy value:

π=argmaxπ  E[Y(π(X))]cE[π(X)]\pi^\star = \arg\max_{\pi} \; \mathbb{E}\big[ Y(\pi(X)) \big] - c\cdot\mathbb{E}[\pi(X)]

Accounting for cost cc and margin, the optimal rule is a threshold policy: π(x)=1{τ^(x)>breakeven}\pi^\star(x) = \mathbf{1}\{\hat\tau(x) > \text{breakeven}\} — i.e., treat only customers whose uplift exceeds the break-even point.

Intuitive Understanding

Once uplift estimation (CATE) tells us “how much more each person will buy,” the policy decides “so whom do we treat.” It is the step that turns a continuous CATE into a binary decision — the industrial instance of Policy Learning.

Methods

  • Threshold on CATE: τ^(x)>BE\hat\tau(x) > \text{BE} — simple and powerful (e.g., econml).
  • Policy Tree / DR Policy Tree (Athey & Wager 2021, Kitagawa & Tetenov 2018): directly learn an interpretable rule. However, quantizing a continuous CATE into rules can lose information.
  • Risk-adjusted policy: when CATE is uncertain due to positivity violations, tune conservativeness via CE-CATE(λ)=(1λ)τ^+λLB\text{CE-CATE}(\lambda) = (1-\lambda)\hat\tau + \lambda\,\text{LB}.
  • Value validation: estimate policy value before deployment with OPE (IPW/AIPW/DR).

Project Application

Dunnhumby: breakeven $42.43 (cost $12.73 / margin 30%). Optimal 31.3% targeting → $2,426 profit (125% ROI); targeting everyone yields a −$4,657 loss. The CATE-threshold beats the PolicyTree by $742. With PS AUC 0.989 (positivity violation), identification is restricted to the 17% overlap region → a conservative policy with λ=0.7–1.0 is recommended. (project canonical)

References

  • MOC-Targeting
  • Study Roadmap — Track 3 (Athey-Wager 2021, Kitagawa-Tetenov 2018 originals)

Local graph