Wide and Deep
Definition
Wide & Deep (Cheng et al., 2016) is a CTR prediction model that combines a linear wide component (memorization) with a DNN deep component (generalization). It was first deployed for Google Play app recommendation.
Wide Component
A generalized linear model that takes the raw features and a cross-product transformation as input:
These cross-product features must be designed manually, and they memorize the co-occurrence of specific feature combinations.
Deep Component
A feed-forward neural network that converts categorical features into dense embeddings and then passes them through several hidden layers:
It implicitly learns high-order feature interactions, handling generalization to unseen feature combinations.
Joint Training
The wide and deep components are jointly trained, and the output of each component is combined via a weighted sum to produce the final prediction. During training, the wide part used FTRL + L1, while the deep part used AdaGrad.
Intuitive Understanding
A recommender system needs two capabilities at once:
- Memorization (Wide): “Recommend apps similar to those the user has installed in the past” — memorizing direct patterns from historical data
- Generalization (Deep): “Given this user’s overall tastes, they might also like apps from a new category” — generalizing to unseen combinations
Using Wide alone overfits to past patterns, while using Deep alone may miss specific co-occurrences. Combining the two captures the strengths of both.
Advantages and Disadvantages
Advantages:
- A balanced combination of memorization and generalization
- Validated by large-scale industrial deployment (Google Play, 500M+ users)
- Interpretable, since the roles of the Wide and Deep parts are clearly defined
Disadvantages:
- The cross-product features of the wide part must be designed manually (requires domain expertise)
- Wide and Deep do not share embeddings — the learning signals of the two components are separated
- The choice of cross-product features strongly affects performance — poor design degrades performance
- Low-order interactions that are not manually defined cannot be captured
Related Concepts
- DeepFM - Replaces Wide with an FM so no feature engineering is needed; introduces shared embeddings
- Factorization Machine - Automatically learns second-order feature interactions; replaces Wide’s cross-products
- FNN - DNN trained after FM pre-training; a different approach from Wide & Deep
- PNN - Captures interactions via a product layer; extends only the Deep part of Wide & Deep
Key Papers
- Cheng, H., et al. (2016). Wide & deep learning for recommender systems. DLRS 2016. — original paper
- guoDeepFMFactorizationMachineBased2017 - DeepFM; improves on the limitations of Wide & Deep