Causal Inference
CATE · counterfactual · causal discovery · SCM · semiparametric · partial ID
Identifying what causes what — heterogeneous treatment effects and counterfactuals — with structural causal models, semiparametric estimation, and sensitivity / partial-identification under unobserved confounding.
51 notes
-
Dunnhumby — Track 2: Causal Targeting via Heterogeneous Treatment Effects
Meta-learner / Causal Forest CATE under severe positivity violation (PS AUC 0.989); an OPE-validated policy targets ~31% of customers and surfaces counter-intuitive negative-CATE segments. Hypothesis-generating on public data.
-
Applied Causal Inference for Pricing — CATE & SCM Across Public Datasets
An applied case study using only public datasets (LendingClub, iPinYou) that combines CATE estimation for price-sensitivity heterogeneity with SCM-based moderator analysis to design individual-level, risk-based pricing and RTB bidding policies — all findings illustrative and projected, not proprietary.
-
Causal Inference Under Partial Identification — Sensitivity and Evidence Hierarchies
When real-world data fail strong ignorability, point identification gives way to bounds, proxies, and sensitivity analysis — an honest hierarchy of evidence that connects credible causal claims to semiparametric efficiency.
-
From Estimation to Action — How HTE Drives Personalized Policy Across Domains
One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.
-
Marketing Attribution at Scale — From Simulation to Causal Inference
A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.
-
Anytime-Valid Inference Overview
Game-theoretic statistics that resolves the "peeking" problem of fixed-sample hypothesis testing. The mathematical foundation for real-time monitoring of identification-validity drift.
-
Efficient Influence Function
Among the regular asymptotically linear (RAL) estimators of a (semi)parametric model, the IF with the smallest variance is the efficient influence function (EIF), and its variance equals the semiparametric efficiency bound (the supremum of the Cramér-Rao bounds over all parametric submodels)…
-
Influence Function
If an estimator $\hat\psi$ of a functional parameter $\psi:\mathcal{P}\to\mathbb{R}$ is asymptotically linear, then an influence function (IF) $\phi$ exists such that
-
Negative Control Outcome (NCO)
An NCO is an outcome variable guaranteed a priori to be unaffected by the treatment's causal influence, yet still cast in the shadow of the same confounder $U$. By contrast, an NCE (negative control exposure) is an exposure with no causal effect on the outcome. If the "apparent effect" on an NCO is nonzero → a signal of unmeasured confounding (detection) → correct for it via proximal methods.
-
One-step Estimator
Corrects first-order bias by adding the empirical mean of the estimated EIF to the plug-in $\psi(\hat P)$:
-
Partial Identification
When point identification is impossible due to a lack of assumptions, we only know that the parameter lies in the identified set $\ThetaI$ (often an interval $[\thetaL,\thetaU]$) compatible with the data plus assumptions. Manski's assumption-free / worst-case bounds are the starting point. sharp bounds =…
-
Proximal Causal Inference
When unmeasured confounding $U$ is present, the causal effect is identified using two types of proxies:
-
TMLE (Targeted Maximum Likelihood Estimation)
A procedure that corrects (targets) a plug-in estimator toward the target parameter:
-
ESCM² (Entire Space Counterfactual Multi-Task Model)
A model that integrates a counterfactual risk regularizer based on the Inverse Propensity Score (IPS) and the Doubly Robust estimator into ESMM, in order to address ESMM's two theoretical limitations — Inherent Estimation Bias (IEB) and Potential Independence Priority (PIP).
-
AIPW (Augmented Inverse Probability Weighting)
- $\hat{\mu}_t(X)$: Outcome model ($E[Y|T=t, X]$)
-
ATT (Average Treatment Effect on the Treated)
Average treatment effect for the group that actually received treatment
-
Back-door Criterion
The Back-door Criterion (Pearl, 1993) is a graphical criterion for identifying a causal effect from observational data. It determines whether a set of variables $Z$ is sufficient to identify the causal effect of $X \rightarrow Y$.
-
BART (Bayesian Additive Regression Trees)
A Bayesian ensemble method that models the outcome as a sum of many trees
-
CATE (Conditional Average Treatment Effect)
The Conditional Average Treatment Effect (CATE) is the average treatment effect given covariates $X=x$:
-
Causal Forest
Causal Forest is a causal-inference application of the Generalized Random Forest (GRF) proposed by Athey, Tibshirani, and Wager (2019), splitting so as to maximize the heterogeneity of treatment effects.
-
CEVAE (Causal Effect Variational Autoencoder)
A method that uses a VAE to infer latent confounders and estimate causal effects.
-
CFR (Counterfactual Regression)
A deep learning method that learns balanced representations via IPM (Integral Probability Metric) regularization
-
Collider
A collider is a variable affected by both the treatment (X) and the outcome (Y) (a common effect). In the structure X → C ← Y, C is a collider.
-
Confounder
A confounder is a variable that affects both the treatment (X) and the outcome (Y) (a common cause), creating a spurious (non-causal) association between X and Y.
-
Constraint-Based Methods Overview
Constraint-based methods recover the causal graph by testing conditional independence (CI) relations in the data. Under the faithfulness assumption, they exploit the correspondence between CI relations and d-separation.
-
d-separation
d-separation (directional separation) is a graphical criterion in a DAG for determining whether two sets of variables are conditionally independent given a third set.
-
DAG (Directed Acyclic Graph)
A DAG (Directed Acyclic Graph) is a graph that visually represents the causal relationships among variables. It is a core tool in causal inference for grasping confounding structure and deciding an identification strategy.
-
do-operator
The do-operator is Pearl's formalization of intervention.
-
Double/Debiased Machine Learning (DML)
A methodology for performing valid statistical inference on a low-dimensional parameter of interest $\theta0$ in the presence of a high-dimensional nuisance parameter $\eta0$.
-
Doubly Robust Estimator
The Doubly Robust (DR) Estimator combines an outcome-regression model and a propensity-score model, remaining consistent as long as just one of the two is correctly specified.
-
DR-Learner
The DR-Learner is a two-stage doubly robust estimator for CATE that regresses a pseudo-outcome on the covariates.
-
Endogeneity
Endogeneity is the problem that arises when an explanatory variable is correlated with the error term.
-
Fundamental Problem of Causal Inference
The problem that, for the same individual, the outcomes under treatment (W=1) and control (W=0) cannot be observed simultaneously
-
HTE (Heterogeneous Treatment Effects)
The phenomenon in which the treatment effect varies with an individual's characteristics
-
Instrumental Variables
Instrumental variables (IV) are exogenous variables used to address the problem of endogeneity.
-
IPW (Inverse Propensity Weighting)
Estimating treatment effects by using the inverse of the propensity score as weights
-
ITE (Individual Treatment Effect)
The treatment effect for individual $i$
-
Mediator
A mediator is an intermediate variable lying on the causal pathway through which a treatment (X) affects an outcome (Y). In the structure X → M → Y, M is the mediator.
-
Meta-learners
Meta-learners are a general term for algorithms that estimate the CATE by leveraging existing supervised learning methods (base learners).
-
Positivity (Overlap)
The probability of receiving treatment lies strictly between 0 and 1 for every covariate value
-
Propensity Score Matching (PSM)
Matching treated and control individuals with similar propensity scores
-
R-Learner
R-Learner (Residualized Learner) is a meta-learner that estimates the CATE using residualized outcomes and residualized treatments based on the Robinson Transformation.
-
Representation Learning Overview
Methods for learning representations that are independent of treatment while remaining useful for outcome prediction.
-
S-Learner
The S-Learner (Single Learner) is a Meta-learner that estimates the response function with a single model including the treatment indicator as a feature, then computes the CATE.
-
SCM (Structural Causal Model)
An SCM (Structural Causal Model) is a framework for mathematically expressing the causal relationships among variables. It is the core of Pearl's causal inference framework.
-
Score-Based Methods Overview
Score-based methods assign a score function to each graph and search for the graph that best fits the data. Unlike constraint-based methods, they optimize model fit without CI tests.
-
Strong Ignorability
An assumption combining Ignorability and Positivity
-
SUTVA (Stable Unit Treatment Value Assumption)
The potential outcome of one unit is not affected by the treatment assignment of other units, and only a single version exists for each treatment level.
-
T-Learner
The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.
-
Treatment Effects Overview
A systematic overview of the treatment effects that serve as the estimands in the Potential Outcome Framework.
-
X-Learner
The X-Learner is a three-stage algorithm that leverages imputed treatment effects, a meta-learner that effectively exploits group imbalance and the structural properties of the CATE.