Instrumental Variables
Definition
Instrumental variables (IV) are exogenous variables used to address the problem of endogeneity.
Conditions for a valid instrument :
- Relevance: — the instrument affects the endogenous variable
- Exclusion Restriction: — the instrument has no direct effect on the outcome
Intuitive Understanding
In the price endogeneity problem, we need to find “price variation that is unrelated to demand.”
An instrumental variable acts like a “natural experiment.” A cost shock (e.g., a rise in raw material prices) affects price but does not directly affect a consumer’s willingness to pay.
Key Properties
2SLS (Two-Stage Least Squares)
Stage 1: Regress the endogenous variable on the instrument
Stage 2: Regress the outcome on the predicted endogenous variable
Weak Instrument Problem
When the first-stage F-statistic is low, the weak instrument problem arises:
- Bias: can be more severe than OLS
- Distorted confidence intervals
Stock-Yogo rule: an F-statistic > 10 is considered safe
Valid Instruments in Pricing
| Instrument Type | Example | Validity |
|---|---|---|
| Cost shifters | Raw material prices, exchange rates, shipping costs | Costs affect price but have no direct effect on consumer WTP |
| Hausman IV | Price of the same product in other markets | Cost shocks are common, demand shocks are local |
| Competitive structure | Number of competitors, BLP IV | Competition affects price |
Example
Python Implementation
from linearmodels.iv import IV2SLS
import numpy as np
# Data
# Y: log quantity, X: log price (endogenous), Z: cost shock (instrument)
iv_model = IV2SLS(
dependent=np.log(data['quantity']),
exog=data<span class="wikilink-dead" title="private note">'const', 'quality'</span>, # exogenous control variables
endog=np.log(data['price']), # endogenous variable
instruments=data<span class="wikilink-dead" title="private note">'cost_shock', 'competitor_price'</span> # instruments
)
iv_results = iv_model.fit(cov_type='robust')
print(f"IV elasticity: {iv_results.params['log_price']:.3f}")
print(f"Standard error: {iv_results.std_errors['log_price']:.3f}")
print(f"First-stage F-statistic: {iv_results.first_stage.diagnostics['f.stat'].stat:.2f}")
BLP Instruments
Instruments proposed by Berry-Levinsohn-Pakes (1995):
# Sum of characteristics of other products within the same market
def create_blp_iv(data, characteristics, market_col='market', product_col='product'):
"""Generate BLP-style instruments"""
ivs = []
for char in characteristics:
# Sum of the characteristic across other products within the same market
market_sums = data.groupby(market_col)[char].transform('sum')
iv = market_sums - data[char]
ivs.append(iv)
return pd.DataFrame(ivs).T
blp_ivs = create_blp_iv(data, ['horsepower', 'weight', 'mpg'])
Instrument Diagnostics
# Weak instrument test
from scipy import stats
# First-stage regression
first_stage = sm.OLS(data['log_price'], data[['const'] + instruments]).fit()
f_stat = first_stage.fvalue
print(f"First-stage F-statistic: {f_stat:.2f}")
if f_stat < 10:
print("Warning: possible weak instrument")
else:
print("Instrument strength adequate")
# Overidentification test (J-test) - when there are two or more instruments
# H0: all instruments are valid
sargan_stat = iv_results.sargan.stat
sargan_pval = iv_results.sargan.pval
print(f"Sargan test: stat={sargan_stat:.2f}, p={sargan_pval:.3f}")
Related Concepts
- Endogeneity - the problem IV addresses
- A-B Testing - the alternative free of endogeneity
- Double-Debiased ML - IV combined with ML (DRIV)
- Price Elasticity - the target estimated via IV
References
- Angrist, J. D., & Pischke, J. S. (2008). Mostly Harmless Econometrics.
- Stock, J. H., & Yogo, M. (2005). “Testing for Weak Instruments in Linear IV Regression.”
- Berry, S., Levinsohn, J., & Pakes, A. (1995). “Automobile Prices in Market Equilibrium.”
- Comprehensive Personalized Pricing Guide, Part II, §6