#causal-inference

51 notes

AIPW (Augmented Inverse Probability Weighting) - $\hat{\mu}_t(X)$: Outcome model ($E[Y|T=t, X]$)
Anytime-Valid Inference Overview Game-theoretic statistics that resolves the "peeking" problem of fixed-sample hypothesis testing. The mathematical foundation for real-time monitoring of identification-validity drift.
Applied Causal Inference for Pricing — CATE & SCM Across Public Datasets An applied case study using only public datasets (LendingClub, iPinYou) that combines CATE estimation for price-sensitivity heterogeneity with SCM-based moderator analysis to design individual-level, risk-based pricing and RTB bidding policies — all findings illustrative and projected, not proprietary.
ATT (Average Treatment Effect on the Treated) Average treatment effect for the group that actually received treatment
Back-door Criterion The Back-door Criterion (Pearl, 1993) is a graphical criterion for identifying a causal effect from observational data. It determines whether a set of variables $Z$ is sufficient to identify the causal effect of $X \rightarrow Y$.
BART (Bayesian Additive Regression Trees) A Bayesian ensemble method that models the outcome as a sum of many trees
CATE (Conditional Average Treatment Effect) The Conditional Average Treatment Effect (CATE) is the average treatment effect given covariates $X=x$:
Causal Forest Causal Forest is a causal-inference application of the Generalized Random Forest (GRF) proposed by Athey, Tibshirani, and Wager (2019), splitting so as to maximize the heterogeneity of treatment effects.
Causal Inference Under Partial Identification — Sensitivity and Evidence Hierarchies When real-world data fail strong ignorability, point identification gives way to bounds, proxies, and sensitivity analysis — an honest hierarchy of evidence that connects credible causal claims to semiparametric efficiency.
CEVAE (Causal Effect Variational Autoencoder) A method that uses a VAE to infer latent confounders and estimate causal effects.
CFR (Counterfactual Regression) A deep learning method that learns balanced representations via IPM (Integral Probability Metric) regularization
Collider A collider is a variable affected by both the treatment (X) and the outcome (Y) (a common effect). In the structure X → C ← Y, C is a collider.
Confounder A confounder is a variable that affects both the treatment (X) and the outcome (Y) (a common cause), creating a spurious (non-causal) association between X and Y.
Constraint-Based Methods Overview Constraint-based methods recover the causal graph by testing conditional independence (CI) relations in the data. Under the faithfulness assumption, they exploit the correspondence between CI relations and d-separation.
d-separation d-separation (directional separation) is a graphical criterion in a DAG for determining whether two sets of variables are conditionally independent given a third set.
DAG (Directed Acyclic Graph) A DAG (Directed Acyclic Graph) is a graph that visually represents the causal relationships among variables. It is a core tool in causal inference for grasping confounding structure and deciding an identification strategy.
do-operator The do-operator is Pearl's formalization of intervention.
Double/Debiased Machine Learning (DML) A methodology for performing valid statistical inference on a low-dimensional parameter of interest $\theta0$ in the presence of a high-dimensional nuisance parameter $\eta0$.
Doubly Robust Estimator The Doubly Robust (DR) Estimator combines an outcome-regression model and a propensity-score model, remaining consistent as long as just one of the two is correctly specified.
DR-Learner The DR-Learner is a two-stage doubly robust estimator for CATE that regresses a pseudo-outcome on the covariates.
Dunnhumby — Track 2: Causal Targeting via Heterogeneous Treatment Effects Meta-learner / Causal Forest CATE under severe positivity violation (PS AUC 0.989); an OPE-validated policy targets ~31% of customers and surfaces counter-intuitive negative-CATE segments. Hypothesis-generating on public data.
Efficient Influence Function Among the regular asymptotically linear (RAL) estimators of a (semi)parametric model, the IF with the smallest variance is the efficient influence function (EIF), and its variance equals the semiparametric efficiency bound (the supremum of the Cramér-Rao bounds over all parametric submodels)…
Endogeneity Endogeneity is the problem that arises when an explanatory variable is correlated with the error term.
ESCM² (Entire Space Counterfactual Multi-Task Model) A model that integrates a counterfactual risk regularizer based on the Inverse Propensity Score (IPS) and the Doubly Robust estimator into ESMM, in order to address ESMM's two theoretical limitations — Inherent Estimation Bias (IEB) and Potential Independence Priority (PIP).
From Estimation to Action — How HTE Drives Personalized Policy Across Domains One methodological spine — estimate heterogeneous treatment effects and turn them into individual-level policies — powers both clinical sequential treatment decisions and industrial targeting, pricing, and recommendation.
Fundamental Problem of Causal Inference The problem that, for the same individual, the outcomes under treatment (W=1) and control (W=0) cannot be observed simultaneously
HTE (Heterogeneous Treatment Effects) The phenomenon in which the treatment effect varies with an individual's characteristics
Influence Function If an estimator $\hat\psi$ of a functional parameter $\psi:\mathcal{P}\to\mathbb{R}$ is asymptotically linear, then an influence function (IF) $\phi$ exists such that
Instrumental Variables Instrumental variables (IV) are exogenous variables used to address the problem of endogeneity.
IPW (Inverse Propensity Weighting) Estimating treatment effects by using the inverse of the propensity score as weights
ITE (Individual Treatment Effect) The treatment effect for individual $i$
Marketing Attribution at Scale — From Simulation to Causal Inference A case study comparing 10+ multi-touch attribution methods against a known-ground-truth simulator, then scaling them on the public Criteo dataset, closing the loop with budget off-policy evaluation for channel allocation.
Mediator A mediator is an intermediate variable lying on the causal pathway through which a treatment (X) affects an outcome (Y). In the structure X → M → Y, M is the mediator.
Meta-learners Meta-learners are a general term for algorithms that estimate the CATE by leveraging existing supervised learning methods (base learners).
Negative Control Outcome (NCO) An NCO is an outcome variable guaranteed a priori to be unaffected by the treatment's causal influence, yet still cast in the shadow of the same confounder $U$. By contrast, an NCE (negative control exposure) is an exposure with no causal effect on the outcome. If the "apparent effect" on an NCO is nonzero → a signal of unmeasured confounding (detection) → correct for it via proximal methods.
One-step Estimator Corrects first-order bias by adding the empirical mean of the estimated EIF to the plug-in $\psi(\hat P)$:
Partial Identification When point identification is impossible due to a lack of assumptions, we only know that the parameter lies in the identified set $\ThetaI$ (often an interval $[\thetaL,\thetaU]$) compatible with the data plus assumptions. Manski's assumption-free / worst-case bounds are the starting point. sharp bounds =…
Positivity (Overlap) The probability of receiving treatment lies strictly between 0 and 1 for every covariate value
Propensity Score Matching (PSM) Matching treated and control individuals with similar propensity scores
Proximal Causal Inference When unmeasured confounding $U$ is present, the causal effect is identified using two types of proxies:
R-Learner R-Learner (Residualized Learner) is a meta-learner that estimates the CATE using residualized outcomes and residualized treatments based on the Robinson Transformation.
Representation Learning Overview Methods for learning representations that are independent of treatment while remaining useful for outcome prediction.
S-Learner The S-Learner (Single Learner) is a Meta-learner that estimates the response function with a single model including the treatment indicator as a feature, then computes the CATE.
SCM (Structural Causal Model) An SCM (Structural Causal Model) is a framework for mathematically expressing the causal relationships among variables. It is the core of Pearl's causal inference framework.
Score-Based Methods Overview Score-based methods assign a score function to each graph and search for the graph that best fits the data. Unlike constraint-based methods, they optimize model fit without CI tests.
Strong Ignorability An assumption combining Ignorability and Positivity
SUTVA (Stable Unit Treatment Value Assumption) The potential outcome of one unit is not affected by the treatment assignment of other units, and only a single version exists for each treatment level.
T-Learner The T-Learner (Two Learner) is a Meta-learner that estimates the CATE by training separate models for the treatment group and the control group.
TMLE (Targeted Maximum Likelihood Estimation) A procedure that corrects (targets) a plug-in estimator toward the target parameter:
Treatment Effects Overview A systematic overview of the treatment effects that serve as the estimands in the Potential Outcome Framework.
X-Learner The X-Learner is a three-stage algorithm that leverages imputed treatment effects, a meta-learner that effectively exploits group imbalance and the structural properties of the CATE.