Tae Hyun Kim (Lowell)

PNN

Definition

PNN (Qu et al., 2016) is a CTR prediction model that introduces a product layer between the embedding layer and the DNN hidden layers, explicitly capturing the interactions among feature embeddings before passing them to the DNN.

Product Layer

The pairwise interactions among embedding vectors ei\mathbf{e}_i are computed with a product operation:

lp={g(ei,ej)}i,j,lz=[e1,e2,,en]\mathbf{l}_p = \{g(\mathbf{e}_i, \mathbf{e}_j)\}_{i,j}, \quad \mathbf{l}_z = [\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n]

The output of the product layer is the combination of the linear signal lz\mathbf{l}_z and the product signal lp\mathbf{l}_p:

a(1)=ReLU(Wzlz+Wplp+b)\mathbf{a}^{(1)} = \text{ReLU}(\mathbf{W}_z \mathbf{l}_z + \mathbf{W}_p \mathbf{l}_p + \mathbf{b})

Variants

Depending on the definition of the product operation g(,)g(\cdot, \cdot), there are three variants:

VariantProduct operationComplexityCharacteristics
IPNNInner product: g(ei,ej)=ei,ejg(\mathbf{e}_i, \mathbf{e}_j) = \langle \mathbf{e}_i, \mathbf{e}_j \rangleO(n2)O(n^2)Scalar output; FM-like interaction
OPNNOuter product: g(ei,ej)=eiejTg(\mathbf{e}_i, \mathbf{e}_j) = \mathbf{e}_i \mathbf{e}_j^TO(n2k2)O(n^2 k^2)Matrix output; richer interaction, higher cost
PNN*Combination of inner + outer productO(n2k2)O(n^2 k^2)Hybrid; combines the strengths of both approaches

Here nn is the number of fields and kk is the embedding dimension.

Intuitive Understanding

If embeddings are simply concatenated and fed into a DNN, the network has to learn the feature interactions implicitly. PNN precomputes the products of embedding pairs and provides them to the DNN as “hints.”

By analogy, instead of having the chef judge the compatibility of two ingredients on their own (DNN), the ingredient combinations are tasted in advance and a compatibility score is provided alongside (the product layer). The chef can use this information to make better decisions.

Advantages and Disadvantages

Advantages:

  • No pre-training required — end-to-end training is possible (an advantage over FNN)
  • The product layer captures feature interactions explicitly
  • IPNN combines FM-like interactions with a DNN

Disadvantages:

  • Ignores low-order interactions (order-1, 2): As the product layer’s output passes through the DNN, the original low-order signal can be distorted. Unlike FM, low-order terms are not reflected directly in the output
  • OPNN computational cost: The outer product is O(n2k2)O(n^2 k^2), which becomes inefficient as the number of features and the embedding dimension grow
  • The pairwise computation in the product layer grows quadratically with the number of fields nn
  • DeepFM - Parallel combination of FM (low-order) + DNN (high-order); complements the low-order interactions that PNN misses
  • Factorization Machine - The inner product of IPNN is similar to FM’s second-order interaction
  • Wide and Deep - Parallel Wide (memorization) + Deep (generalization); a different design philosophy from PNN
  • FNN - FM pre-training + DNN; PNN replaces pre-training with a product layer instead

Key Papers

  • Qu, Y., et al. (2016). Product-based neural networks for user response prediction. ICDM 2016. — the original PNN paper
  • guoDeepFMFactorizationMachineBased2017 - DeepFM; addresses PNN’s limitation (absence of low-order terms) by combining with FM

Local graph