PNN · Tae Hyun Kim (Lowell)

Definition

PNN (Qu et al., 2016) is a CTR prediction model that introduces a product layer between the embedding layer and the DNN hidden layers, explicitly capturing the interactions among feature embeddings before passing them to the DNN.

Product Layer

The pairwise interactions among embedding vectors $\mathbf{e}_i$ are computed with a product operation:

$\mathbf{l}_p = \{g(\mathbf{e}_i, \mathbf{e}_j)\}_{i,j}, \quad \mathbf{l}_z = [\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n]$

The output of the product layer is the combination of the linear signal $\mathbf{l}_z$ and the product signal $\mathbf{l}_p$ :

$\mathbf{a}^{(1)} = \text{ReLU}(\mathbf{W}_z \mathbf{l}_z + \mathbf{W}_p \mathbf{l}_p + \mathbf{b})$

Variants

Depending on the definition of the product operation $g(\cdot, \cdot)$ , there are three variants:

Variant	Product operation	Complexity	Characteristics
IPNN	Inner product: $g(\mathbf{e}_i, \mathbf{e}_j) = \langle \mathbf{e}_i, \mathbf{e}_j \rangle$	$O(n^2)$	Scalar output; FM-like interaction
OPNN	Outer product: $g(\mathbf{e}_i, \mathbf{e}_j) = \mathbf{e}_i \mathbf{e}_j^T$	$O(n^2 k^2)$	Matrix output; richer interaction, higher cost
PNN*	Combination of inner + outer product	$O(n^2 k^2)$	Hybrid; combines the strengths of both approaches

Here $n$ is the number of fields and $k$ is the embedding dimension.

Intuitive Understanding

If embeddings are simply concatenated and fed into a DNN, the network has to learn the feature interactions implicitly. PNN precomputes the products of embedding pairs and provides them to the DNN as “hints.”

By analogy, instead of having the chef judge the compatibility of two ingredients on their own (DNN), the ingredient combinations are tasted in advance and a compatibility score is provided alongside (the product layer). The chef can use this information to make better decisions.

Advantages and Disadvantages

Advantages:

No pre-training required — end-to-end training is possible (an advantage over FNN)
The product layer captures feature interactions explicitly
IPNN combines FM-like interactions with a DNN

Disadvantages:

Ignores low-order interactions (order-1, 2): As the product layer’s output passes through the DNN, the original low-order signal can be distorted. Unlike FM, low-order terms are not reflected directly in the output
OPNN computational cost: The outer product is $O(n^2 k^2)$ , which becomes inefficient as the number of features and the embedding dimension grow
The pairwise computation in the product layer grows quadratically with the number of fields $n$

DeepFM - Parallel combination of FM (low-order) + DNN (high-order); complements the low-order interactions that PNN misses
Factorization Machine - The inner product of IPNN is similar to FM’s second-order interaction
Wide and Deep - Parallel Wide (memorization) + Deep (generalization); a different design philosophy from PNN
FNN - FM pre-training + DNN; PNN replaces pre-training with a product layer instead

Key Papers

Qu, Y., et al. (2016). Product-based neural networks for user response prediction. ICDM 2016. — the original PNN paper
guoDeepFMFactorizationMachineBased2017 - DeepFM; addresses PNN’s limitation (absence of low-order terms) by combining with FM