Causal Inference Under Partial Identification — Sensitivity and Evidence Hierarchies

Textbook causal inference rests on clean assumptions. Strong ignorability — treatment is independent of the potential outcomes conditional on the measured covariates ( $\{Y(0),Y(1)\}\perp A\mid X$ ), and every unit has positive probability of receiving either treatment (Positivity). When both hold, the ATE is point-identified, and we get to report a single number with its confidence interval. The catch is that real-world data — EHR, claims, observational cohorts, logs — almost never satisfy these assumptions. Unmeasured confounding ( $U$ ) is nearly always present, and some patient subgroups effectively never receive a given treatment (a positivity violation).

So the choice looks binary: trust an unverifiable assumption and report one number (overconfident), or give up entirely and declare “observational data can say nothing” (overcautious). The argument of this essay is that there is a rich middle ground between them. Proximal methods, partial identification, and sensitivity analysis quantify what we can still honestly say when the assumptions break. These are not competing techniques but stages of a single evidence hierarchy.

The Unifying Idea: Identification Is Not Binary

The key shift is to stop treating identification as an all-or-nothing event and to see it as a continuous spectrum. At one end sit assumption-free worst-case bounds (least informative, most honest); at the other, point identification under strong ignorability (most precise, most fragile). Every point in between is a choice along an assumptions-versus-precision trade-off. Add an assumption and the estimand set narrows; drop one and it widens.

Through this lens, the three families of methods are different answers to the same question — “if we give up strong ignorability, what can the data still support?”

The Method Arc: Three Ways to Relax the Assumptions

1. Proximal Methods — Triangulating Unmeasured Confounding Through Shadows

The most common reason unconfoundedness fails is an unmeasured confounder $U$ . The insight of Proximal Causal Inference is that even when we cannot measure $U$ directly, we can recover the effect if we observe two shadows (proxies) of it. Two kinds of proxy are needed:

a negative control exposure (NCE) $Z$ — an exposure with no causal effect on the outcome but associated with $U$ ;
a negative control outcome (NCO) $W$ — an outcome not affected by treatment yet shadowed by the same $U$ .

An NCO is a “canary in the coal mine.” If it should not move yet its apparent effect is nonzero, that is a signal of unmeasured confounding — a tool for detection. Going further, if an outcome confounding bridge function $h$ solves the Fredholm integral equation

$E[Y\mid Z,A,X]=E\big[h(W,A,X)\mid Z,A,X\big]$

then the causal effect is identified as a functional of $h$ — a tool for correction. The key enabling condition is completeness: the proxies must reflect enough of the variation in $U$ . Instead of assuming strong ignorability, proximal methods trade it for an assumption about the proxy structure. This is especially attractive for clinical multimodal data, where imaging, labs, and text can be different shadows of the same latent state.

2. Partial Identification — Report a Set Instead of a Point

When even proxies are unavailable, Partial Identification takes a step back. If point identification is impossible, we know only that the parameter lies in an identified set $\Theta_I$ (often an interval $[\theta_L,\theta_U]$ ) compatible with the data plus the assumptions we are willing to state. Manski’s assumption-free bounds are the starting point — using only the support of the potential outcomes and adding no identification assumptions, they give the widest but most defensible range.

Here sharp bounds matter — the narrowest set that uses all available information. And inference becomes subtle: we want a confidence interval for the parameter, not the set, and the Imbens–Manski (2004) construction is the standard for this. The spirit of partial identification is honest agnosticism — instead of a single number under an unverifiable assumption, report the range the data can support. Showing how the set shrinks as you add one assumption at a time is itself a deliverable of the analysis.

3. Sensitivity Analysis — How Strong Would Confounding Have to Be to Overturn the Conclusion?

The third stage inverts the question. Where partial ID asks “what does the data support,” sensitivity analysis asks “how strong would unmeasured confounding have to be to break my conclusion?” You build a model that explicitly violates unconfoundedness — parameterizing the strength of a latent $U$ that acts on both treatment assignment and outcome through one or two sensitivity parameters — and compute the critical strength at which the estimate’s sign or significance flips.

The power of this approach is interpretability. A statement like “to explain away the detected effect, an unmeasured confounder would have to be associated with both treatment and outcome more strongly than the strongest covariate we measured” is something a clinician or policymaker can evaluate with domain knowledge. If the effect breaks under weak confounding, the conclusion is fragile; if it demands implausibly strong confounding, it is robust. Coupled with anytime-valid inference, the strength of evidence is sometimes summarized in a single number such as an E-value.

The Link to Efficiency: Honesty Does Not Mean Surrendering Precision

A common misconception is worth confronting here — that relaxing assumptions must make estimation inefficient. The truth is closer to the opposite. Once any of these three methods defines an identified estimand (an endpoint of the narrowed set, or a functional of the bridge function), the estimation of that estimand is governed by the same semiparametric efficiency theory.

Every such estimand has an efficient influence function (EIF), whose variance sets the attainable semiparametric efficiency bound (the supremum of the Cramér–Rao bounds over all parametric submodels). The EIF for the ATE is the familiar AIPW form

$\phi(O)=\mu_1(X)-\mu_0(X)-\psi+\frac{A\,(Y-\mu_1(X))}{e(X)}-\frac{(1-A)(Y-\mu_0(X))}{1-e(X)}$

and the parameters of proximal bridges or sensitivity models each carry their own EIF. Using the EIF as an estimating equation — combined with the Neyman orthogonality and cross-fitting of Double-Debiased ML — yields $\sqrt{n}$ -consistent, doubly robust inference for the target estimand while the nuisance functions (outcome model, propensity, bridge) are fit with flexible ML. In other words, honesty at the identification stage (what is identified, under which assumptions) and efficiency at the estimation stage (how precisely that estimand is estimated) are two separate axes, and the latter is handled by the same machinery wherever the former sits in the hierarchy. Where positivity is weak, propensities hug 0 or 1 and weights explode, so reshaping the target population via overlap weighting or trimming is part of the same honesty.

Why It Matters Across Domains

This hierarchy is not a niche trick. In clinical decision-making, RCTs are often ethically or financially impossible, and observational EHR data hide unmeasured disease severity or physician preference as confounders. Proximal, partial-ID, and sensitivity methods honestly grade how strongly we can claim “this treatment works” — exactly the language of regulatory and clinical adoption. In industrial targeting, pricing, and recommendation, logs are contaminated by past policy (the chronic ailment of Off-Policy Evaluation), and some user segments have never received a given treatment (a positivity violation). The same tools answer “how robust is this campaign’s uplift estimate to hidden selection.”

The through-line is identical: personalized decision-making ultimately comes down to reliably estimating the difference in potential outcomes across treatments, and real-world data do not hand us that reliability for free. Partial identification, proximal methods, and sensitivity analysis are the methodological honesty of admitting what we do not know while drawing the boundary of what we do know as sharply as possible. And within that boundary, semiparametric efficiency theory guarantees the narrowest possible confidence interval. This is how we make credible causal claims without leaning on unverifiable assumptions.

Partial Identification — Manski bounds, identified set, sharp bounds, Imbens–Manski CI
Proximal Causal Inference — proxy variables and the outcome confounding bridge to correct for unmeasured confounding
Negative Control Outcome — the canary for unmeasured confounding, detection plus correction
Positivity — the overlap assumption and how its violation affects estimation
Efficient Influence Function — the efficiency bound for an identified estimand
AIPW — the doubly robust estimator matching the ATE’s EIF
Double-Debiased ML — Neyman orthogonality plus cross-fitting for flexible nuisance estimation

Local graph