Wide and Deep

정의

Wide & Deep (Cheng et al., 2016)은 **Linear wide component(memorization)**와 **DNN deep component(generalization)**를 결합한 CTR 예측 모델이다. Google Play 앱 추천에서 처음 적용되었다.

$\hat{y} = \sigma\big(\mathbf{w}_{\text{wide}}^T [\mathbf{x}, \boldsymbol{\phi}(\mathbf{x})] + \mathbf{w}_{\text{deep}}^T \mathbf{a}^{(L)} + b\big)$

Wide Component

Generalized linear model로, raw feature $\mathbf{x}$ 와 cross-product transformation $\boldsymbol{\phi}(\mathbf{x})$ 를 입력으로 받는다:

$\boldsymbol{\phi}_k(\mathbf{x}) = \prod_{i=1}^{d} x_i^{c_{ki}}, \quad c_{ki} \in \{0, 1\}$

이 cross-product feature는 수동으로 설계해야 하며, 특정 feature 조합의 co-occurrence를 memorize한다.

Deep Component

Feed-forward neural network로, categorical feature를 dense embedding으로 변환한 후 여러 hidden layer를 거친다:

$\mathbf{a}^{(l+1)} = \text{ReLU}(\mathbf{W}^{(l)} \mathbf{a}^{(l)} + \mathbf{b}^{(l)})$

High-order feature interaction을 implicit하게 학습하여 unseen feature 조합에 대한 generalization을 담당한다.

Joint Training

Wide와 Deep component를 jointly training하며, 각 component의 output을 weighted sum하여 최종 예측을 생성한다. 학습 시 Wide는 FTRL + L1, Deep은 AdaGrad를 사용했다.

직관적 이해

추천 시스템에서 두 가지 능력이 모두 필요하다:

Memorization (Wide): “사용자가 과거에 설치한 앱과 유사한 앱 추천” — 과거 데이터에서 직접적인 패턴을 기억
Generalization (Deep): “이 사용자의 전반적 취향으로 볼 때 새로운 카테고리의 앱도 좋아할 수 있음” — 보지 못한 조합으로 일반화

Wide만 쓰면 과거 패턴에 과적합되고, Deep만 쓰면 특정 co-occurrence를 놓칠 수 있다. 둘을 결합하면 양쪽의 장점을 취할 수 있다.

장단점

장점:

Memorization과 generalization의 균형 잡힌 결합
산업계에서 대규모 적용 검증 (Google Play, 5억+ 사용자)
Wide/Deep 각각의 역할이 명확하여 해석 가능

단점:

Wide part의 cross-product feature를 수동으로 설계해야 함 (domain expertise 필요)
Wide와 Deep이 embedding을 공유하지 않음 — 두 component 간 학습 신호 분리
Cross-product feature의 선택이 성능에 큰 영향 — 잘못 설계하면 성능 저하
Low-order interaction 중 수동 정의되지 않은 조합은 포착 불가

참고 논문

Cheng, H., et al. (2016). Wide & deep learning for recommender systems. DLRS 2016. — 원논문
guoDeepFMFactorizationMachineBased2017 - DeepFM; Wide & Deep의 한계를 개선

정의

Wide Component

Deep Component

Joint Training

직관적 이해

장단점

관련 개념

참고 논문

연결 그래프