Tae Hyun Kim (Lowell)
← All projects
PersonalizationCausal Inference

Customer Segmentation & Causal Targeting

An end-to-end analysis on the public Dunnhumby retail dataset — NMF + K-Means segmentation feeding meta-learner / Causal-Forest HTE and an OPE-validated targeting policy.

2026 · Solo · end-to-end (data → model → policy)
Pythonscikit-learnEconMLCausal ForestNMFOptuna
Try it yourself — 1 demo that make the result tangible

⏱️ TL;DR (30 seconds)


Seven customer segments — loyalty × deal-seeking positioning Track 1 — seven behavioral segments in loyalty (F2) × deal-seeking (F1) space.

ROI by targeting fraction — peaks near 31% Track 2 — ROI peaks near ~31% targeting; targeting everyone loses money.

🎯 Key Results at a Glance

Key metricValueNotes
Data scale2,500 households · ~2.6M transactions · 102 weeksDunnhumby “The Complete Journey”
Number of segments7 (NMF k=5 → K-Means k=7)92.44% variance explained, bootstrap ARI 0.77±0.11
Breakeven CATE$42.43cost $12.73 / margin 0.30
Optimal targeting31.3% (152 / 486)profit +$2,426, ROI 125%
Target everyone (100%)486 customersprofit -$4,659, ROI -75%
Current practice (62.1%)302 customersprofit -$3,402, ROI -88%
Improvement (optimal vs. all)+$7,085+200pp ROI (hypothesis-generating)
Positivity diagnosticPS AUC 0.989, overlap 17%severe violation → hypothesis-generating reading
Primary CATE modelCausalForestDMLchosen on low-variance/validity grounds (see Track 2 for details)

CATE by segment — the negative effect of high-value segments is the counter-intuitive crux

Customer Segmentation & Causal Targeting — Project Overview CATE distribution by segment. The negative effects of VIP Heavy (-$38) and Bulk Shoppers (-$40) are clear — a signal to scale targeting down.

Motivation & Framework

This project tackles, within a single pipeline, both the traditional segmentation question “who are our customers?” and the causal-inference question “for whom, and by how much, is this campaign effective?”.

Customer Segmentation & Causal Targeting — Project Overview

Why are both tracks needed?

AspectTrack 1 (Descriptive)Track 2 (Causal)
Core question”Who is this customer?""Will the campaign be effective for this customer?”
Main usersMarketing, CRM, StrategyData Science, Optimization
Explainability”Premium Fresh Lover segment""This customer’s CATE = +$34”
Organizational requirementMarketing capability grounded in customer understandingCausal thinking + an execution framework for personalized targeting

Key Insights: Counter-Intuitive Findings

Negative (-) Treatment Effect of High-Value Customers

The N values below are based on the Track 2 analysis cohort (486 customers) and differ from Track 1’s full segment sizes (509, 299, …). CATE is a hypothesis-generating estimate.

SegmentCustomer value (overall mean revenue)Mean CATEN (cohort)Direction signal
VIP Heavy$9,716 (highest)-$3859Scale down / exclude from TypeA
Bulk Shoppers$3,206-$4077Scale down / exclude from TypeA

Why do high-value customers show a negative CATE?

SegmentRoot-cause analysis
VIP HeavyAlready a high purchaser → ceiling effect; the coupon merely substitutes for existing purchases (cannibalization)
Bulk ShoppersCoupon-based TypeA mismatches their irregular, bulk-buying shopping rhythm

Business impact (on the 486-customer analysis cohort; hypothesis-generating):

Customer Segmentation & Causal Targeting — Project Overview


Results Summary

Track 1 Results: Latent Factor Modeling + Clustering

Interpretation of the 5 latent factors (NMF k=5, 92.44% variance explained):

FactorNameTop features (loading)Interpretation
F1Grocery Deal Seekershare_grocery(6.72), discount_usage_pct(5.13), private_label_ratio(3.41)Discount-seeking, budget-conscious
F2Loyal Regularpurchase_regularity(4.63), n_departments(2.61), n_products(1.53), frequency(1.04)One-stop, high-engagement (Value)
F3Big Basketmonetary_std(2.45), monetary_avg_basket(2.35), share_grocery(2.08)Irregular bulk purchases (Value)
F4Fresh Focusedshare_fresh(2.26), n_departments(1.21)Fresh-food specialist (Need)
F5Health & Beautyshare_health_beauty(2.03), recency(0.41)Drugstore type (Need)

Customer Segmentation & Causal Targeting — Project Overview Feature loadings of the 5 latent factors. F2 (Loyal) and F3 (Big Basket) capture the Value dimension; F4 (Fresh) and F5 (H&B) capture the Need dimension.

Clustering evaluation metrics (summary):

MetricValueInterpretation
Explained Variance92.44%High factor coverage
Silhouette Score (k=7)0.219Reasonable for behavioral data (not the global maximum — see appendix)
Calinski-Harabasz (k=7)732.0
Davies-Bouldin Index (k=7)1.241Minimum among k candidates (best separation)
Bootstrap ARI0.77 ± 0.11 (n=100)High segment stability

Honest rationale for choosing k: Silhouette is in fact highest at low k (k=3 = 0.271). k=7 was chosen on the grounds of minimum DBI (1.241) + business interpretability/actionability + high bootstrap stability (ARI 0.77). We do not claim that “silhouette is highest at k=7.” (Full grid in the Track 1 report appendix.)

The 7 customer segments (based on all 2,500 customers):

SegNameSizeMean revenueFrequency (visits)Recency (days)RegularityMain factor
0Active Loyalists509 (20.4%)$3,87817160.78F2 (Loyal)
1VIP Heavy299 (12.0%)$9,71625640.88F2 (Loyal)
2Lapsed H&B193 (7.7%)$87237750.25F5 (H&B)
3Fresh Lovers339 (13.6%)$1,23348360.34F4 (Fresh)
4Light Grocery524 (21.0%)$94243420.30F1 (Grocery-Deal)
5Bulk Shoppers318 (12.7%)$3,20656240.41F3 (Basket)
6Regular + H&B318 (12.7%)$3,393152120.70F2 (Loyal)

Light Grocery (Seg 4) has its grocery share (0.56) + discount (0.51) loading onto F1 (Grocery-Deal), so its main factor is recorded as F1.

Seven segments — loyalty × deal-seeking positioning The seven segments in loyalty (F2) × deal-seeking (F1) space. VIP Heavy (high-loyalty / low-deal = premium) and Active Loyalists (high-loyalty / high-deal = budget-conscious loyal) respond to discounts in opposite ways despite both being “loyal” — this positioning gap is the behavioral basis for the per-segment CATE split in Track 2.

Marketing strategy by segment (Track 1 based):

SegmentPriorityStrategyKey actions
VIP HeavyHighRetentionPremium benefits, churn prediction, exclusive access
Active LoyalistsHighStrengthenPrivate-label promotions, loyalty points, basket expansion
Regular + H&BMediumUpgradeVIP-conversion program, cross-category incentives
Bulk ShoppersMediumRegularizeSubscription offers, scheduled delivery, bundle deals
Fresh LoversMediumEngageFresh-food content, daily specials, recipes
Light GroceryLowActivateHabit-forming campaigns, gradual rewards
Lapsed H&BLowWin-backRe-engagement campaigns, H&B-focused offers

💡 Track 1 vs. Track 2 strategy difference: Track 1 is a general strategy based on customer characteristics; Track 2 is a TypeA-campaign targeting strategy based on CATE. VIP Heavy is “Retention” in Track 1, but in Track 2 it diverges into a recommendation to “scale down” TypeA targeting.

Track 2 Results: CATE and Optimal Targeting

ATE estimation (by method, n=2,430):

MethodATE95% CIReliability
Naive+$471[$442, $501]❌ Upward bias
IPW+$151[-$10, $313]⚠️ Unstable
AIPW+$24[-$56, $104]✅ Doubly-robust
OLS+$65[$29, $102]
DML-$65[-$220, $90]⚠️ Direction reversal
ATO (Overlap)+$60[-$14, $134]✅ Overlap-focused

The estimates scatter widely across methods, from -$65 to +$471 — a direct symptom of the positivity violation. We place more trust in the overlap-focused estimate (ATO +$60) and the doubly-robust estimate (AIPW +$24).

CATE model performance (main run; test set):

ModelMean CATETest StdAUUC% positiveSelection
CausalForestDML+$15$52271.678%Primary
LinearDML-$139$452357.042%❌ Highest AUUC but unstable
NonParamDML+$1.1M (diverges)very large304.464%❌ Diverges
S-Learner-$21$46289.521%❌ 79% negative effects (unrealistic)
X-Learner-$96$208218.538%❌ High variance
T-Learner-$200$397212.043%❌ High variance

Model-selection rationale (honest reframe): CausalForestDML does not have the highest AUUC. In the main run the highest AUUC belongs to LinearDML (357.0), with CausalForestDML in 4th place (271.6). We nonetheless chose CausalForestDML as the primary model because it is the only model that simultaneously satisfies low variance (std $52 vs. LinearDML’s $452) and a valid CATE distribution (mean +$10–15, 78% positive — consistent with the campaign’s objective). The models with higher AUUC are unusable: LinearDML (mean -$139, std $452), NonParamDML (diverges); S-Learner has similar variance but implies an unrealistic distribution in which 79% of customers have a negative effect. Under a severe positivity violation, stability and validity take priority over raw AUUC. (Supporting evidence: BLP test p=0.094 borderline, X-Learner p=0.005 — the heterogeneity signal is weak. See the Track 2 report for details.) For the full cohort (486 customers), CausalForestDML’s mean CATE is consistently +$10.

Customer Segmentation & Causal Targeting — Project Overview AUUC comparison across CATE models. The uplift curve of CausalForestDML, selected on stability/validity grounds. Targeting the top 30% is expected to yield $2,200+ in additional revenue.

N is based on the Track 2 analysis cohort (486 customers) and differs from Track 1’s full segment sizes. “Current/recommended targeting %” is not a value present in the source CSV, so it is presented only as a direction (expand/hold/scale down).

SegmentN (cohort)Mean CATERecommended action (direction)
Active Loyalists97+$33Test & Learn (slight expand)
Regular + H&B62+$34Test & Learn (slight expand)
Light Grocery91+$30Test & Learn (expand)
Fresh Lovers73+$27Test & Learn (expand)
Lapsed H&B27+$19Test & Learn
VIP Heavy59-$38Scale down / exclude from TypeA
Bulk Shoppers77-$40Scale down / exclude from TypeA

Customer Segmentation & Causal Targeting — Project Overview CATE distribution by segment. The negative effects of VIP Heavy (-$38) and Bulk Shoppers (-$40) are clear.

Policy Comparison Analysis (486-customer cohort)

PolicyCriterionTarget % (N)ProfitROICharacteristic
CATE > BreakevenPoint est. > $42.4331.3% (152)+$2,426125%✅ Optimal
Top 20% CATEPercentile20.0% (97)+$2,259183%Highest ROI under budget constraint
ConservativeLower CI > $42.430.6% (3)+$188493%Ultra-conservative (pre-A/B)
Risk-Adjusted (30%)Risk-adjusted11.1% (54)+$1,603233%Variance penalty
PolicyTree (Tuned)Learned rule26.7% (130)+$1,684102%Interpretable
CATE > 0Point est. > 064.6% (314)+$1,44736%Loose rule
Current practice62.1% (302)-$3,402-88%❌ Loss
Target everyone100% (486)-$4,659-75%Loss

ROI by targeting fraction Targeting customers from the highest CATE downward, ROI peaks near 31% (+$2,426 / 125%) and then bends as negative-CATE customers accumulate, turning into a loss (−$4,659) at 100% targeting — one chart showing that whom you exclude matters as much as whom you include.


Limitations & Lessons Learned

LimitationEvidenceMitigation
Positivity ViolationPS AUC = 0.989, overlap 17%PS trimming, ATO weighting, Manski bounds, partial identification
Refutation Test failuresPlacebo (Amount) 0.747 (>0.5 → FAIL), Subset Stability 0.561 (<0.7 → FAIL)A/B test validation design (n=5,748)
Model disagreementCausalForest +$10 vs. LinearDML -$139Selection on stability/validity grounds; acknowledge directional disagreement
Single campaign typeOnly TypeA analyzedTypeB/C require separate analysis

Refutation failed — and that is the expected result. Placebo Treatment (Amount) = 0.747 (threshold <0.5) and Subset Stability = 0.561 (threshold >0.7) both failed to pass their thresholds (though Placebo-Visits = 0.052 passed). This is the expected signal under a positivity violation, and it supports treating the results as hypothesis-generating rather than confirmatory. The results are neither hidden nor softened.

Lessons

“PS AUC 0.989 reveals the fundamental limit of an observational study. The results should be interpreted as hypothesis-generating, and deployed only after validation via an A/B test.”

Future Directions

  1. A/B test validation: validate the hypotheses with n=5,748 (2,874/arm, 80% power, α=0.05, detectable effect ~$34).
  2. ε-greedy exploration: assign treatment to every customer with at least probability ε → guarantees positivity.
  3. MLOps extension: a CATE monitoring dashboard and a model-retraining pipeline.

Try it yourself

Target the Right Customers

Each segment shows only its profile (annual spend · behavior). Pick who to mail coupons to, then run the campaign to reveal the hidden per-customer uplift.

selected
0 / 360 coupons

Artifacts