Double/Debiased Machine Learning (DML)

Definition

고차원 nuisance parameter $\eta_0$ 존재 하에서 저차원 관심 parameter $\theta_0$ 에 대한 유효한 통계적 추론을 수행하기 위한 방법론.

Two Key Ingredients:

\sqrt{N}(\hat{\theta}_{DML} - \theta_0) \xrightarrow{d} N(0, V)

문제 상황: 전통적인 ML 방법으로 nuisance parameter를 추정하고 직접 대입하면:

DML의 해결책:

Neyman-orthogonal score: Nuisance parameter 추정 오차에 덜 민감한 moment condition 구성
Cross-fitting: 데이터를 분할하여 overfitting bias 제거

Traditional:  η̂ (ML) → plug-in → θ̂ (biased, inconsistent)
     ↓
DML:  Orthogonal score + Cross-fitting → θ̂ (√N-consistent, asymptotically normal)

각 fold에서 $\theta$ 를 별도로 추정한 후 평균: $\tilde{\theta}_0 = \frac{1}{K}\sum_{k=1}^K \check{\theta}_{0,k}$

집계된 estimating equation 해결: $\frac{1}{K}\sum_{k=1}^K E_{n,k}[\psi(W; \tilde{\theta}_0, \hat{\eta}_{0,k})] = 0$

DML2가 소표본에서 더 나은 성능을 보이는 경향

Model: $Y = D\theta_0 + g_0(X) + U, \quad E[U|X,D] = 0$ $D = m_0(X) + V, \quad E[V|X] = 0$

Orthogonal Score: $\psi(W; \theta, \eta) = (Y - D\theta - g(X))(D - m(X))$

Algorithm:

Split data into K folds
For each fold k:
- Estimate $\hat{g}(X)$ and $\hat{m}(X)$ on other folds using ML
- Compute residuals: $\tilde{Y} = Y - \hat{g}(X)$ , $\tilde{D} = D - \hat{m}(X)$
Estimate $\theta$ by regressing $\tilde{Y}$ on $\tilde{D}$

DML이 작동하려면 nuisance parameter 추정의 convergence rate 조건 필요:

$||\hat{g} - g_0|| \cdot ||\hat{m} - m_0|| = o_P(N^{-1/2})$

예: 각각 $N^{-1/4}$ 이상의 rate가 필요

Advantages	Limitations
$\sqrt{N}$ -consistent	Rate condition 필요
Valid inference	계산 집약적 (multiple splits)
Any ML method 사용 가능	ML method 선택 가이드 부족
High-dimensional nuisance 허용	Finite sample 성능 가변적