Endogeneity
Definition
Endogeneity is the problem that arises when an explanatory variable is correlated with the error term.
In this case the OLS estimator is biased and inconsistent.
Intuitive Understanding
In estimating price elasticity, endogeneity is the most fundamental challenge.
Firms do not set prices at random. They raise prices when they expect demand to be high and lower them when they expect it to be low. This optimizing behavior creates a spurious correlation between price and unobserved demand factors.
Key Properties
Sources of Endogeneity
| Source | Description | Example |
|---|---|---|
| Simultaneity | Price and quantity are determined simultaneously | Market equilibrium |
| Omitted variables | Omission of a variable that affects both price and demand | Quality, brand |
| Reverse causality | Demand affects price | Demand-forecast-based pricing |
| Measurement error | Measurement error in the explanatory variable | Missing records of price discounts |
Direction of the Bias
Price endogeneity almost always leads to underestimation of elasticity (a positive bias).
Since higher quality → higher price and higher demand:
Result: although the true elasticity is negative, the estimate can approach 0 or even become positive.
Example
Simulation
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(42)
n = 5000
# Unobserved quality (confounder)
quality = np.random.randn(n)
# Price: depends on quality (endogenous!)
true_price_effect = -2.0
price = 20 + 3 * quality + np.random.randn(n) * 2
# Demand: depends on both price and quality
demand = 100 + true_price_effect * price + 10 * quality + np.random.randn(n) * 5
# Naive OLS (quality uncontrolled) - biased!
naive_model = LinearRegression()
naive_model.fit(price.reshape(-1, 1), demand)
print(f"True price effect: {true_price_effect}")
print(f"Naive OLS estimate: {naive_model.coef_[0]:.3f}") # biased toward ~ -0.5
# After controlling for quality - consistent
X_controlled = np.column_stack([price, quality])
controlled_model = LinearRegression()
controlled_model.fit(X_controlled, demand)
print(f"After controlling for quality: {controlled_model.coef_[0]:.3f}") # ~ -2.0
Solution Approaches
| Approach | Core idea | Assumption |
|---|---|---|
| Experiments | Randomly assign prices | Ethical/cost constraints |
| Instrumental variables | Exploit exogenous price variation | Exclusion restriction |
| Control strategy | Control for sufficient covariates | Unconfoundedness |
| Structural models | Explicitly model the economic structure | Functional-form assumptions |
Related Concepts
- Instrumental Variables - the primary remedy for endogeneity
- Confounder - a variable that induces endogeneity
- A-B Testing - causal estimation free of endogeneity
- Price Elasticity - the target for which endogeneity is a problem
References
- Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and Panel Data.
- Comprehensive Personalized Pricing Guide, Part I, §3