Notes
82 public notes on causal inference, decision-making, and personalization
-
Dunnhumby — Track 1: Latent-Factor Customer Segmentation
NMF latent factors (92.44% explained variance) + K-Means yield 7 stable behavioral segments (Bootstrap ARI 0.77) with per-segment marketing actions. Illustrative case study on the public Dunnhumby retail dataset.
-
Dunnhumby — Track 2: Causal Targeting via Heterogeneous Treatment Effects
Meta-learner / Causal Forest CATE under severe positivity violation (PS AUC 0.989); an OPE-validated policy targets ~31% of customers and surfaces counter-intuitive negative-CATE segments. Hypothesis-generating on public data.
-
Applied Causal Inference for Pricing — CATE & SCM Across Public Datasets
An applied case study using only public datasets (LendingClub, iPinYou) that combines CATE estimation for price-sensitivity heterogeneity with SCM-based moderator analysis to design individual-level, risk-based pricing and RTB bidding policies — all findings illustrative and projected, not proprietary.
-
Causal Inference Under Partial Identification — Sensitivity and Evidence Hierarchies
When real-world data fail strong ignorability, point identification gives way to bounds, proxies, and sensitivity analysis — an honest hierarchy of evidence that connects credible causal claims to semiparametric efficiency.
-
Customer Segmentation
Customer Segmentation is the unsupervised task of partitioning customers into a finite set of segments by similarity in behavior, value, and preference. A common recipe is latent-factor decomposition followed by clustering: behavioral features → NMF (non-negative, parts-based decomposition) → factor scores → K-Means → segments.
-
Customer Segmentation & Causal Targeting — An Applied Case Study
An end-to-end applied case study on the public Dunnhumby dataset — NMF latent factors and K-Means segmentation feeding meta-learner / Causal Forest HTE and an OPE-validated optimal targeting policy, with a candid look at positivity violation and counter-intuitive "sleeping dog" segments.
-
From Estimation to Action — How HTE Drives Personalized Policy Across Domains
One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.
-
LLM Multi-Layer Attribute Extraction for Cross-Domain Recommendation
A case study on extracting a 3-layer attribute taxonomy (product / perceptual / theory-grounded) with LLM/VLM pipelines, turning it into user profiles and a mixture-of-experts adaptor, and plugging it into standard recommenders across two public domains (fashion + music).
-
Marketing Attribution at Scale — From Simulation to Causal Inference
A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.
-
Optimal Targeting Policy
An Optimal Targeting Policy maps covariates $x$ to a treatment decision $\pi(x)\in\{0,1\}$ so as to maximize policy value:
-
RTB Bidding Strategy via Causal ML — From Prediction to Optimization
A five-stage case study on the public iPinYou RTB dataset that moves from pCTR/pCVR prediction through causal effect estimation (CATE, SCM) to budget-constrained optimal bidding and off-policy policy evaluation.
-
Sequential and Adaptive Decision-Making — From Bandits to Dynamic Treatment Regimes
A synthesis essay tracing one methodological spine through sequential decision-making under uncertainty — exploration–exploitation in bandits, off-policy evaluation, and optimal/dynamic treatment regimes — that powers clinical adaptive trials and real-time bidding alike.
-
Targeting & Profiling Overview
Targeting & Profiling is the industrial face of personalization. The same methodological core (heterogeneous effect estimation → individual-level optimal policy; MOC-Personalization) appears in clinical settings as "optimal treatment assignment per patient," and in industry as "optimal campaign/exposure assignment per customer." This domain answers who…
-
Uplift Modeling
Uplift is the causal increment that a treatment (campaign exposure, coupon, recommendation) induces in an individual's outcome (purchase, conversion). For binary treatment $W\in\{0,1\}$, outcome $Y$, and covariates $X$.
-
User Profiling
User Profiling is the task of inferring a personal preference profile (taste, context, latent patterns) from a customer's behavioral history and representing it as a vector. It is the shared input layer for targeting, segmentation, and recommendation — the industry-side counterpart to patient covariate/multimodal representations (Multimodal Clinical Data) in the clinical domain.
-
Anytime-Valid Inference Overview
Game-theoretic statistics that resolves the "peeking" problem of fixed-sample hypothesis testing. The mathematical foundation for real-time monitoring of identification-validity drift.
-
Anytime-Valid OPE
Anytime-valid off-policy evaluation that provides time-uniform off-policy value confidence sequences valid at any stopping time; based on e-processes/confidence sequences.
-
Confidence Sequence
A confidence sequence (CS) $(C_t){t\ge1}$ is a sequence of confidence intervals with time-uniform coverage:
-
Decision-Making Overview
Methods for (sequential) decision-making under uncertainty — from bandit regret to RL, off-policy evaluation, and dynamic/optimal treatment regimes. Underpins both clinical (DTR/OTR) and industrial (targeting, bidding) personalization.
-
Dynamic Treatment Regimes (DTR / OTR)
A DTR is a sequence of decision rules $\{d_t(H_t)\}_{t=1}^T$ mapping the accumulated history $H_t$ (covariates, prior treatments, intermediate outcomes) to a treatment. The optimal treatment regime (OTR) maximizes the expected long-term outcome $E[Y^{d}]$. Estimation:
-
e-process (e-value)
An e-value $E$ is a nonnegative random variable with $EP[E]\le 1$ ($\forall P\in H0$) under the null $H0$. An e-process $(Et)$ is a nonnegative process such that $E\tau$ is an e-value at any stopping time $\tau$ ($E[E\tau]\le1$) — typically a nonnegative supermartingale under the null.…
-
Efficient Influence Function
Among the regular asymptotically linear (RAL) estimators of a (semi)parametric model, the IF with the smallest variance is the efficient influence function (EIF), and its variance equals the semiparametric efficiency bound (the supremum of the Cramér-Rao bounds over all parametric submodels)…
-
Influence Function
If an estimator $\hat\psi$ of a functional parameter $\psi:\mathcal{P}\to\mathbb{R}$ is asymptotically linear, then an influence function (IF) $\phi$ exists such that
-
Multi-Armed Bandits
$K$ arms; at each round $t$ pull $A_t$ and observe a reward. Minimize cumulative regret:
-
Negative Control Outcome (NCO)
An NCO is an outcome variable guaranteed a priori to be unaffected by the treatment's causal influence, yet still cast in the shadow of the same confounder $U$. By contrast, an NCE (negative control exposure) is an exposure with no causal effect on the outcome. If the "apparent effect" on an NCO is nonzero → a signal of unmeasured confounding (detection) → correct for it via proximal methods.
-
Off-Policy Evaluation (OPE)
Estimate the value $V(\pie)=E{\pie}[\sum r]$ of a target policy $\pie$ from logs collected under a different behavior policy $\pib$.
-
One-step Estimator
Corrects first-order bias by adding the empirical mean of the estimated EIF to the plug-in $\psi(\hat P)$:
-
Partial Identification
When point identification is impossible due to a lack of assumptions, we only know that the parameter lies in the identified set $\ThetaI$ (often an interval $[\thetaL,\thetaU]$) compatible with the data plus assumptions. Manski's assumption-free / worst-case bounds are the starting point. sharp bounds =…
-
Proximal Causal Inference
When unmeasured confounding $U$ is present, the causal effect is identified using two types of proxies:
-
TMLE (Targeted Maximum Likelihood Estimation)
A procedure that corrects (targets) a plug-in estimator toward the target parameter:
-
ESCM² (Entire Space Counterfactual Multi-Task Model)
A model that integrates a counterfactual risk regularizer based on the Inverse Propensity Score (IPS) and the Doubly Robust estimator into ESMM, in order to address ESMM's two theoretical limitations — Inherent Estimation Bias (IEB) and Potential Independence Priority (PIP).
-
ESMM (Entire Space Multi-Task Model)
A multi-task model that addresses CVR's Sample Selection Bias and Data Sparsity problems simultaneously by exploiting the sequential user behavior $\text{impression} \to \text{click} \to \text{conversion}$ to learn CVR indirectly over the entire impression space.
-
DeepFM
DeepFM (Guo et al., 2017) is a CTR prediction model that combines an FM component and a Deep component in parallel, jointly learning low-order (explicit) and high-order (implicit) feature interactions.
-
Factorization Machine
The Factorization Machine (FM) is a general-purpose prediction model proposed by Rendle (2010) that models interactions between all pairs of features as inner products of latent factor vectors.
-
PNN
PNN (Qu et al., 2016) is a CTR prediction model that introduces a product layer between the embedding layer and the DNN hidden layers, explicitly capturing the interactions among feature embeddings before passing them to the DNN.
-
Wide and Deep
Wide & Deep (Cheng et al., 2016) is a CTR prediction model that combines a linear wide component (memorization) with a DNN deep component (generalization). It was first deployed for Google Play app recommendation.
-
Multi-Task Learning
A learning paradigm that jointly trains several related tasks, improving generalization through a shared representation.
-
AIPW (Augmented Inverse Probability Weighting)
- $\hat{\mu}_t(X)$: Outcome model ($E[Y|T=t, X]$)
-
A/B Testing
A/B testing is the online application of the randomized controlled trial (RCT), estimating causal effects by randomly exposing two or more variants to users.
-
ATT (Average Treatment Effect on the Treated)
Average treatment effect for the group that actually received treatment
-
Back-door Criterion
The Back-door Criterion (Pearl, 1993) is a graphical criterion for identifying a causal effect from observational data. It determines whether a set of variables $Z$ is sufficient to identify the causal effect of $X \rightarrow Y$.
-
BART (Bayesian Additive Regression Trees)
A Bayesian ensemble method that models the outcome as a sum of many trees
-
CATE (Conditional Average Treatment Effect)
The Conditional Average Treatment Effect (CATE) is the average treatment effect given covariates $X=x$:
-
Causal Forest
Causal Forest is a causal-inference application of the Generalized Random Forest (GRF) proposed by Athey, Tibshirani, and Wager (2019), splitting so as to maximize the heterogeneity of treatment effects.
-
CEVAE (Causal Effect Variational Autoencoder)
A method that uses a VAE to infer latent confounders and estimate causal effects.
-
CFR (Counterfactual Regression)
A deep learning method that learns balanced representations via IPM (Integral Probability Metric) regularization
-
Collider
A collider is a variable affected by both the treatment (X) and the outcome (Y) (a common effect). In the structure X → C ← Y, C is a collider.
-
Confounder
A confounder is a variable that affects both the treatment (X) and the outcome (Y) (a common cause), creating a spurious (non-causal) association between X and Y.
-
Constraint-Based Methods Overview
Constraint-based methods recover the causal graph by testing conditional independence (CI) relations in the data. Under the faithfulness assumption, they exploit the correspondence between CI relations and d-separation.
-
Contextual Bandits
Contextual Bandits are a multi-armed bandit problem in which the optimal action (arm) varies depending on the context.
-
CUPED
CUPED (Controlled-experiment Using Pre-Experiment Data) is a technique that leverages pre-experiment data to reduce the variance of A/B tests.
-
d-separation
d-separation (directional separation) is a graphical criterion in a DAG for determining whether two sets of variables are conditionally independent given a third set.
-
DAG (Directed Acyclic Graph)
A DAG (Directed Acyclic Graph) is a graph that visually represents the causal relationships among variables. It is a core tool in causal inference for grasping confounding structure and deciding an identification strategy.
-
Design Effect
The Design Effect (DEFF) measures the impact of a complex sampling design on variance relative to simple random sampling.
-
do-operator
The do-operator is Pearl's formalization of intervention.
-
Double/Debiased Machine Learning (DML)
A methodology for performing valid statistical inference on a low-dimensional parameter of interest $\theta0$ in the presence of a high-dimensional nuisance parameter $\eta0$.
-
Doubly Robust Estimator
The Doubly Robust (DR) Estimator combines an outcome-regression model and a propensity-score model, remaining consistent as long as just one of the two is correctly specified.
-
DR-Learner
The DR-Learner is a two-stage doubly robust estimator for CATE that regresses a pseudo-outcome on the covariates.
-
Endogeneity
Endogeneity is the problem that arises when an explanatory variable is correlated with the error term.
-
Fundamental Problem of Causal Inference
The problem that, for the same individual, the outcomes under treatment (W=1) and control (W=0) cannot be observed simultaneously
-
HTE (Heterogeneous Treatment Effects)
The phenomenon in which the treatment effect varies with an individual's characteristics
-
Instrumental Variables
Instrumental variables (IV) are exogenous variables used to address the problem of endogeneity.
-
IPW (Inverse Propensity Weighting)
Estimating treatment effects by using the inverse of the propensity score as weights
-
ITE (Individual Treatment Effect)
The treatment effect for individual $i$
-
MDP (Markov Decision Process)
A Markov Decision Process (MDP) is a mathematical framework for sequential decision-making problems.
-
Mediator
A mediator is an intermediate variable lying on the causal pathway through which a treatment (X) affects an outcome (Y). In the structure X → M → Y, M is the mediator.
-
Meta-learners
Meta-learners are a general term for algorithms that estimate the CATE by leveraging existing supervised learning methods (base learners).
-
Policy Trees
Policy Trees, proposed by Athey & Wager (2021), are an interpretable policy-learning method.
-
Positivity (Overlap)
The probability of receiving treatment lies strictly between 0 and 1 for every covariate value
-
Propensity Score Matching (PSM)
Matching treated and control individuals with similar propensity scores
-
R-Learner
R-Learner (Residualized Learner) is a meta-learner that estimates the CATE using residualized outcomes and residualized treatments based on the Robinson Transformation.
-
Representation Learning Overview
Methods for learning representations that are independent of treatment while remaining useful for outcome prediction.
-
S-Learner
The S-Learner (Single Learner) is a Meta-learner that estimates the response function with a single model including the treatment indicator as a feature, then computes the CATE.
-
SCM (Structural Causal Model)
An SCM (Structural Causal Model) is a framework for mathematically expressing the causal relationships among variables. It is the core of Pearl's causal inference framework.
-
Score-Based Methods Overview
Score-based methods assign a score function to each graph and search for the graph that best fits the data. Unlike constraint-based methods, they optimize model fit without CI tests.
-
Statistical Power
Statistical power is the probability of detecting an effect when it truly exists.
-
Strong Ignorability
An assumption combining Ignorability and Positivity
-
SUTVA (Stable Unit Treatment Value Assumption)
The potential outcome of one unit is not affected by the treatment assignment of other units, and only a single version exists for each treatment level.
-
T-Learner
The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.
-
Thompson Sampling
Thompson Sampling is a Bayesian approach that balances exploration and exploitation.
-
Treatment Effects Overview
A systematic overview of the treatment effects that serve as the estimands in the Potential Outcome Framework.
-
X-Learner
The X-Learner is a three-stage algorithm that leverages imputed treatment effects, a meta-learner that effectively exploits group imbalance and the structural properties of the CATE.
No notes match these filters.
Three pillars organize the work — and every note belongs to one. Research Pillars →