Double/Debiased Machine Learning (DML)
Definition
A methodology for performing valid statistical inference on a low-dimensional parameter of interest in the presence of a high-dimensional nuisance parameter .
Two Key Ingredients:
- Use of a Neyman-Orthogonal Score
- Application of Cross-fitting (sample splitting)
Intuitive Understanding
The problem: If you estimate the nuisance parameter with a traditional ML method and plug it in directly:
- Regularization bias arises
- Bias arises from overfitting
- The estimator fails to achieve consistency
DML’s solution:
- Neyman-orthogonal score: construct a moment condition that is less sensitive to nuisance-parameter estimation error
- Cross-fitting: split the data to remove overfitting bias
Traditional: η̂ (ML) → plug-in → θ̂ (biased, inconsistent)
↓
DML: Orthogonal score + Cross-fitting → θ̂ (√N-consistent, asymptotically normal)
Key Properties
- convergence rate: achieves the optimal convergence rate
- Asymptotic normality: converges to a standard normal distribution
- Valid inference: standard t-tests and confidence intervals can be used
- Method agnostic: a variety of ML methods such as Lasso, Random Forest, and Neural Networks can be used
- High-dimensional: works even without traditional complexity constraints (the Donsker property)
Algorithm
DML1 (Averaging)
Estimate separately on each fold, then average:
DML2 (Pooling)
Solve the aggregated estimating equation:
DML2 tends to perform better in small samples
Example: Partially Linear Regression
Model:
Orthogonal Score:
Algorithm:
- Split data into K folds
- For each fold k:
- Estimate and on other folds using ML
- Compute residuals: ,
- Estimate by regressing on
Rate Conditions
For DML to work, a convergence-rate condition on the nuisance-parameter estimation is required:
e.g., each must achieve a rate of at least
Related Concepts
- Neyman-Orthogonal Score - the core theoretical tool of DML
- Cross-fitting - sample splitting to remove overfitting bias
- Partially Linear Model - the canonical application of DML
- CATE - treatment effect estimable with DML
- Doubly Robust Estimator - similar robustness properties
Applications
- Treatment effect estimation with high-dimensional controls
- Instrumental variables with many instruments
- Structural parameter estimation in complex models
- Policy evaluation with rich covariate sets
- Personalized pricing with customer features
Advantages vs Limitations
| Advantages | Limitations |
|---|---|
| -consistent | Requires a rate condition |
| Valid inference | Computationally intensive (multiple splits) |
| Any ML method can be used | Lack of guidance on choosing the ML method |
| Allows high-dimensional nuisance | Variable finite-sample performance |
References
- chernozhukovDoubleDebiasedMachine2018 - Original DML paper
- kennedyOptimalDoublyRobust2023 - Related doubly robust methods