Customer Segmentation & Causal Targeting

⏱️ TL;DR (30 seconds)

Problem — Whom should we send the coupon campaign (TypeA) to? Sending to everyone only inflates cost and produces a loss.
Approach — A 2-Track Framework. Track 1 answers “who are our customers?” (NMF + K-Means → 7 segments); Track 2 answers “whom should we target?” (CATE estimation → policy learning).
Three key results
1. High-value segments (VIP Heavy, Bulk Shoppers) show a counter-intuitively negative (-) CATE → ceiling / cannibalization effect.
2. Targeting only 31.3% (152/486) of the cohort under a CATE > breakeven ($42.43) rule is profit-optimal.
3. The observational data exhibits a severe positivity violation (PS AUC = 0.989, overlap 17%) → results are hypothesis-generating and require confirmation via an A/B test.
Impact in one line — Optimal targeting (+$2,426) versus targeting everyone (-$4,659) yields a +$7,085 / +200pp ROI improvement (on the analysis cohort; hypothesis-generating estimate).

Seven customer segments — loyalty × deal-seeking positioning Track 1 — seven behavioral segments in loyalty (F2) × deal-seeking (F1) space.

$ROI by targeting fraction — peaks near 31%$ Track 2 — ROI peaks near ~31% targeting; targeting everyone loses money.

🎯 Key Results at a Glance

Key metric	Value	Notes
Data scale	2,500 households · ~2.6M transactions · 102 weeks	Dunnhumby “The Complete Journey”
Number of segments	7 (NMF k=5 → K-Means k=7)	92.44% variance explained, bootstrap ARI 0.77±0.11
Breakeven CATE	$42.43	cost $12.73 / margin 0.30
Optimal targeting	31.3% (152 / 486)	profit +$2,426, ROI 125%
Target everyone (100%)	486 customers	profit -$4,659, ROI -75%
Current practice (62.1%)	302 customers	profit -$3,402, ROI -88%
Improvement (optimal vs. all)	+$7,085	+200pp ROI (hypothesis-generating)
Positivity diagnostic	PS AUC 0.989, overlap 17%	severe violation → hypothesis-generating reading
Primary CATE model	CausalForestDML	chosen on low-variance/validity grounds (see Track 2 for details)

CATE by segment — the negative effect of high-value segments is the counter-intuitive crux

Customer Segmentation & Causal Targeting — Project Overview CATE distribution by segment. The negative effects of VIP Heavy (-$38) and Bulk Shoppers (-$40) are clear — a signal to scale targeting down.

Motivation & Framework

This project tackles, within a single pipeline, both the traditional segmentation question “who are our customers?” and the causal-inference question “for whom, and by how much, is this campaign effective?”.

Customer Segmentation & Causal Targeting — Project Overview

Why are both tracks needed?

Aspect	Track 1 (Descriptive)	Track 2 (Causal)
Core question	”Who is this customer?"	"Will the campaign be effective for this customer?”
Main users	Marketing, CRM, Strategy	Data Science, Optimization
Explainability	”Premium Fresh Lover segment"	"This customer’s CATE = +$34”
Organizational requirement	Marketing capability grounded in customer understanding	Causal thinking + an execution framework for personalized targeting

Key Insights: Counter-Intuitive Findings

Negative (-) Treatment Effect of High-Value Customers

The N values below are based on the Track 2 analysis cohort (486 customers) and differ from Track 1’s full segment sizes (509, 299, …). CATE is a hypothesis-generating estimate.

Segment	Customer value (overall mean revenue)	Mean CATE	N (cohort)	Direction signal
VIP Heavy	$9,716 (highest)	-$38	59	Scale down / exclude from TypeA
Bulk Shoppers	$3,206	-$40	77	Scale down / exclude from TypeA

Why do high-value customers show a negative CATE?

Segment	Root-cause analysis
VIP Heavy	Already a high purchaser → ceiling effect; the coupon merely substitutes for existing purchases (cannibalization)
Bulk Shoppers	Coupon-based TypeA mismatches their irregular, bulk-buying shopping rhythm

Business impact (on the 486-customer analysis cohort; hypothesis-generating):

Customer Segmentation & Causal Targeting — Project Overview

Results Summary

Track 1 Results: Latent Factor Modeling + Clustering

Interpretation of the 5 latent factors (NMF k=5, 92.44% variance explained):

Factor	Name	Top features (loading)	Interpretation
F1	Grocery Deal Seeker	share_grocery(6.72), discount_usage_pct(5.13), private_label_ratio(3.41)	Discount-seeking, budget-conscious
F2	Loyal Regular	purchase_regularity(4.63), n_departments(2.61), n_products(1.53), frequency(1.04)	One-stop, high-engagement (Value)
F3	Big Basket	monetary_std(2.45), monetary_avg_basket(2.35), share_grocery(2.08)	Irregular bulk purchases (Value)
F4	Fresh Focused	share_fresh(2.26), n_departments(1.21)	Fresh-food specialist (Need)
F5	Health & Beauty	share_health_beauty(2.03), recency(0.41)	Drugstore type (Need)

Customer Segmentation & Causal Targeting — Project Overview Feature loadings of the 5 latent factors. F2 (Loyal) and F3 (Big Basket) capture the Value dimension; F4 (Fresh) and F5 (H&B) capture the Need dimension.

Clustering evaluation metrics (summary):

Metric	Value	Interpretation
Explained Variance	92.44%	High factor coverage
Silhouette Score (k=7)	0.219	Reasonable for behavioral data (not the global maximum — see appendix)
Calinski-Harabasz (k=7)	732.0	—
Davies-Bouldin Index (k=7)	1.241	Minimum among k candidates (best separation)
Bootstrap ARI	0.77 ± 0.11 (n=100)	High segment stability

Honest rationale for choosing k: Silhouette is in fact highest at low k (k=3 = 0.271). k=7 was chosen on the grounds of minimum DBI (1.241) + business interpretability/actionability + high bootstrap stability (ARI 0.77). We do not claim that “silhouette is highest at k=7.” (Full grid in the Track 1 report appendix.)

The 7 customer segments (based on all 2,500 customers):

Seg	Name	Size	Mean revenue	Frequency (visits)	Recency (days)	Regularity	Main factor
0	Active Loyalists	509 (20.4%)	$3,878	171	6	0.78	F2 (Loyal)
1	VIP Heavy	299 (12.0%)	$9,716	256	4	0.88	F2 (Loyal)
2	Lapsed H&B	193 (7.7%)	$872	37	75	0.25	F5 (H&B)
3	Fresh Lovers	339 (13.6%)	$1,233	48	36	0.34	F4 (Fresh)
4	Light Grocery	524 (21.0%)	$942	43	42	0.30	F1 (Grocery-Deal)
5	Bulk Shoppers	318 (12.7%)	$3,206	56	24	0.41	F3 (Basket)
6	Regular + H&B	318 (12.7%)	$3,393	152	12	0.70	F2 (Loyal)

Light Grocery (Seg 4) has its grocery share (0.56) + discount (0.51) loading onto F1 (Grocery-Deal), so its main factor is recorded as F1.

Seven segments — loyalty × deal-seeking positioning The seven segments in loyalty (F2) × deal-seeking (F1) space. VIP Heavy (high-loyalty / low-deal = premium) and Active Loyalists (high-loyalty / high-deal = budget-conscious loyal) respond to discounts in opposite ways despite both being “loyal” — this positioning gap is the behavioral basis for the per-segment CATE split in Track 2.

Marketing strategy by segment (Track 1 based):

Segment	Priority	Strategy	Key actions
VIP Heavy	High	Retention	Premium benefits, churn prediction, exclusive access
Active Loyalists	High	Strengthen	Private-label promotions, loyalty points, basket expansion
Regular + H&B	Medium	Upgrade	VIP-conversion program, cross-category incentives
Bulk Shoppers	Medium	Regularize	Subscription offers, scheduled delivery, bundle deals
Fresh Lovers	Medium	Engage	Fresh-food content, daily specials, recipes
Light Grocery	Low	Activate	Habit-forming campaigns, gradual rewards
Lapsed H&B	Low	Win-back	Re-engagement campaigns, H&B-focused offers

💡 Track 1 vs. Track 2 strategy difference: Track 1 is a general strategy based on customer characteristics; Track 2 is a TypeA-campaign targeting strategy based on CATE. VIP Heavy is “Retention” in Track 1, but in Track 2 it diverges into a recommendation to “scale down” TypeA targeting.

Track 2 Results: CATE and Optimal Targeting

ATE estimation (by method, n=2,430):

Method	ATE	95% CI	Reliability
Naive	+$471	[$442, $501]	❌ Upward bias
IPW	+$151	[-$10, $313]	⚠️ Unstable
AIPW	+$24	[-$56, $104]	✅ Doubly-robust
OLS	+$65	[$29, $102]	—
DML	-$65	[-$220, $90]	⚠️ Direction reversal
ATO (Overlap)	+$60	[-$14, $134]	✅ Overlap-focused

The estimates scatter widely across methods, from -$65 to +$471 — a direct symptom of the positivity violation. We place more trust in the overlap-focused estimate (ATO +$60) and the doubly-robust estimate (AIPW +$24).

CATE model performance (main run; test set):

Model	Mean CATE	Test Std	AUUC	% positive	Selection
CausalForestDML	+$15	$52	271.6	78%	✅ Primary
LinearDML	-$139	$452	357.0	42%	❌ Highest AUUC but unstable
NonParamDML	+$1.1M (diverges)	very large	304.4	64%	❌ Diverges
S-Learner	-$21	$46	289.5	21%	❌ 79% negative effects (unrealistic)
X-Learner	-$96	$208	218.5	38%	❌ High variance
T-Learner	-$200	$397	212.0	43%	❌ High variance

Model-selection rationale (honest reframe): CausalForestDML does not have the highest AUUC. In the main run the highest AUUC belongs to LinearDML (357.0), with CausalForestDML in 4th place (271.6). We nonetheless chose CausalForestDML as the primary model because it is the only model that simultaneously satisfies low variance (std $52 vs. LinearDML’s $452) and a valid CATE distribution (mean +$10–15, 78% positive — consistent with the campaign’s objective). The models with higher AUUC are unusable: LinearDML (mean -$139, std $452), NonParamDML (diverges); S-Learner has similar variance but implies an unrealistic distribution in which 79% of customers have a negative effect. Under a severe positivity violation, stability and validity take priority over raw AUUC. (Supporting evidence: BLP test p=0.094 borderline, X-Learner p=0.005 — the heterogeneity signal is weak. See the Track 2 report for details.) For the full cohort (486 customers), CausalForestDML’s mean CATE is consistently +$10.

Customer Segmentation & Causal Targeting — Project Overview AUUC comparison across CATE models. The uplift curve of CausalForestDML, selected on stability/validity grounds. Targeting the top 30% is expected to yield $2,200+ in additional revenue.

CATE and Recommended Action by Segment

N is based on the Track 2 analysis cohort (486 customers) and differs from Track 1’s full segment sizes. “Current/recommended targeting %” is not a value present in the source CSV, so it is presented only as a direction (expand/hold/scale down).

Segment	N (cohort)	Mean CATE	Recommended action (direction)
Active Loyalists	97	+$33	Test & Learn (slight expand)
Regular + H&B	62	+$34	Test & Learn (slight expand)
Light Grocery	91	+$30	Test & Learn (expand)
Fresh Lovers	73	+$27	Test & Learn (expand)
Lapsed H&B	27	+$19	Test & Learn
VIP Heavy	59	-$38	Scale down / exclude from TypeA
Bulk Shoppers	77	-$40	Scale down / exclude from TypeA

Customer Segmentation & Causal Targeting — Project Overview CATE distribution by segment. The negative effects of VIP Heavy (-$38) and Bulk Shoppers (-$40) are clear.

Policy Comparison Analysis (486-customer cohort)

Policy	Criterion	Target % (N)	Profit	ROI	Characteristic
CATE > Breakeven	Point est. > $42.43	31.3% (152)	+$2,426	125%	✅ Optimal
Top 20% CATE	Percentile	20.0% (97)	+$2,259	183%	Highest ROI under budget constraint
Conservative	Lower CI > $42.43	0.6% (3)	+$188	493%	Ultra-conservative (pre-A/B)
Risk-Adjusted (30%)	Risk-adjusted	11.1% (54)	+$1,603	233%	Variance penalty
PolicyTree (Tuned)	Learned rule	26.7% (130)	+$1,684	102%	Interpretable
CATE > 0	Point est. > 0	64.6% (314)	+$1,447	36%	Loose rule
Current practice	—	62.1% (302)	-$3,402	-88%	❌ Loss
Target everyone	—	100% (486)	-$4,659	-75%	❌ Loss

$ROI by targeting fraction$ Targeting customers from the highest CATE downward, ROI peaks near 31% (+$2,426 / 125%) and then bends as negative-CATE customers accumulate, turning into a loss (−$4,659) at 100% targeting — one chart showing that whom you exclude matters as much as whom you include.

Limitations & Lessons Learned

Limitation	Evidence	Mitigation
Positivity Violation	PS AUC = 0.989, overlap 17%	PS trimming, ATO weighting, Manski bounds, partial identification
Refutation Test failures	Placebo (Amount) 0.747 (>0.5 → FAIL), Subset Stability 0.561 (<0.7 → FAIL)	A/B test validation design (n=5,748)
Model disagreement	CausalForest +$10 vs. LinearDML -$139	Selection on stability/validity grounds; acknowledge directional disagreement
Single campaign type	Only TypeA analyzed	TypeB/C require separate analysis

Refutation failed — and that is the expected result. Placebo Treatment (Amount) = 0.747 (threshold <0.5) and Subset Stability = 0.561 (threshold >0.7) both failed to pass their thresholds (though Placebo-Visits = 0.052 passed). This is the expected signal under a positivity violation, and it supports treating the results as hypothesis-generating rather than confirmatory. The results are neither hidden nor softened.

Lessons

“PS AUC 0.989 reveals the fundamental limit of an observational study. The results should be interpreted as hypothesis-generating, and deployed only after validation via an A/B test.”

Future Directions

A/B test validation: validate the hypotheses with n=5,748 (2,874/arm, 80% power, α=0.05, detectable effect ~$34).
ε-greedy exploration: assign treatment to every customer with at least probability ε → guarantees positivity.
MLOps extension: a CATE monitoring dashboard and a model-retraining pipeline.