IPW (Inverse Propensity Weighting)

Advantage	Description
Simple	Intuitive and easy to implement
Nonparametric	No outcome-model assumptions required
Theoretical justification	Guarantees conditional consistency
Flexibility	Applicable to a variety of estimands

Disadvantages

Disadvantage	Description
Dependence on PS estimation	Biased when the PS is misspecified
Sensitivity to extreme PS	Unstable when $e(X) \approx 0$ or $1$
High variance	Especially when overlap is weak
Difficult in high dimensions	PS estimation is difficult

Extreme PS Problem

Problem

When $e(X) \to 0$ or $e(X) \to 1$ :

Weights explode: $1/e(X) \to \infty$
Estimator becomes unstable

Solutions

Trimming: remove samples with extreme PS
Overlap Weighting: use stable weights
Weight clipping: set an upper bound on weights

Implementation

Python (EconML)

from econml.dr import LinearDRLearner

# IPW without an outcome model
model = LinearDRLearner(model_propensity=LogisticRegression())
model.fit(Y, T, X)
ate = model.effect(X).mean()

R

library(WeightIt)

# Propensity score weights
weights <- weightit(treat ~ x1 + x2, data = df, method = "ps")

# Weighted outcome regression
lm(y ~ treat, data = df, weights = weights$weights)

Re-weighting Methods Overview - consolidated overview of reweighting methods
Propensity Score - the core tool
Doubly Robust Estimator - IPW + outcome regression
CBPS - directly optimizing balance
Trimming - handling extreme PS
Overlap Weighting - stable weighting

Application: Correcting RTB Win Selection Bias

In RTB, training only on won impressions introduces win selection bias. Correct it with IPW:

w_i = \frac{1}{p_{\text{win}}(x_i, b_i)}, \quad p_{\text{win}} = P(\text{win} \mid X, \text{bid})

The win propensity is estimated via Survival Analysis (Kaplan-Meier) or gradient boosting. Weight stabilization (clipping, normalization) is essential. For details, see Multi-Task Learning (IPW-ESCM²).

References

yaoSurveyCausalInference2021 - Section 3.1.3
Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score
Horvitz, D. G., & Thompson, D. J. (1952). A generalization of sampling without replacement
Zhang et al. (2016). Bid-aware Gradient Descent (KDD)

Definition

Intuitive Understanding

Why inverse weighting?

Resampling Perspective

Mathematical Derivation

ATE Identification

Sample Estimator

Normalized Version

IPW for ATT

Pros and Cons

Advantages