Double/Debiased Machine Learning (DML)

Definition

A methodology for performing valid statistical inference on a low-dimensional parameter of interest $\theta_0$ in the presence of a high-dimensional nuisance parameter $\eta_0$ .

Two Key Ingredients:

Use of a Neyman-Orthogonal Score
Application of Cross-fitting (sample splitting)

\sqrt{N}(\hat{\theta}_{DML} - \theta_0) \xrightarrow{d} N(0, V)

Intuitive Understanding

The problem: If you estimate the nuisance parameter with a traditional ML method and plug it in directly:

Regularization bias arises
Bias arises from overfitting
The $\theta$ estimator fails to achieve $N^{-1/2}$ consistency

DML’s solution:

Neyman-orthogonal score: construct a moment condition that is less sensitive to nuisance-parameter estimation error
Cross-fitting: split the data to remove overfitting bias

Traditional:  η̂ (ML) → plug-in → θ̂ (biased, inconsistent)
     ↓
DML:  Orthogonal score + Cross-fitting → θ̂ (√N-consistent, asymptotically normal)

Key Properties

$N^{-1/2}$ convergence rate: achieves the optimal convergence rate
Asymptotic normality: converges to a standard normal distribution
Valid inference: standard t-tests and confidence intervals can be used
Method agnostic: a variety of ML methods such as Lasso, Random Forest, and Neural Networks can be used
High-dimensional: works even without traditional complexity constraints (the Donsker property)

Algorithm

DML1 (Averaging)

Estimate $\theta$ separately on each fold, then average: $\tilde{\theta}_0 = \frac{1}{K}\sum_{k=1}^K \check{\theta}_{0,k}$

DML2 (Pooling)

Solve the aggregated estimating equation: $\frac{1}{K}\sum_{k=1}^K E_{n,k}[\psi(W; \tilde{\theta}_0, \hat{\eta}_{0,k})] = 0$

DML2 tends to perform better in small samples

Example: Partially Linear Regression

Model: $Y = D\theta_0 + g_0(X) + U, \quad E[U|X,D] = 0$ $D = m_0(X) + V, \quad E[V|X] = 0$

Orthogonal Score: $\psi(W; \theta, \eta) = (Y - D\theta - g(X))(D - m(X))$

Algorithm:

Split data into K folds
For each fold k:
- Estimate $\hat{g}(X)$ and $\hat{m}(X)$ on other folds using ML
- Compute residuals: $\tilde{Y} = Y - \hat{g}(X)$ , $\tilde{D} = D - \hat{m}(X)$
Estimate $\theta$ by regressing $\tilde{Y}$ on $\tilde{D}$

Rate Conditions

For DML to work, a convergence-rate condition on the nuisance-parameter estimation is required:

$||\hat{g} - g_0|| \cdot ||\hat{m} - m_0|| = o_P(N^{-1/2})$

e.g., each must achieve a rate of at least $N^{-1/4}$

Neyman-Orthogonal Score - the core theoretical tool of DML
Cross-fitting - sample splitting to remove overfitting bias
Partially Linear Model - the canonical application of DML
CATE - treatment effect estimable with DML
Doubly Robust Estimator - similar robustness properties

Applications

Treatment effect estimation with high-dimensional controls
Instrumental variables with many instruments
Structural parameter estimation in complex models
Policy evaluation with rich covariate sets
Personalized pricing with customer features

Advantages vs Limitations

Advantages	Limitations
$\sqrt{N}$ -consistent	Requires a rate condition
Valid inference	Computationally intensive (multiple splits)
Any ML method can be used	Lack of guidance on choosing the ML method
Allows high-dimensional nuisance	Variable finite-sample performance

References

chernozhukovDoubleDebiasedMachine2018 - Original DML paper
kennedyOptimalDoublyRobust2023 - Related doubly robust methods

Local graph