ESCM² (Entire Space Counterfactual Multi-Task Model)

정의

ESMM의 두 가지 이론적 한계 — **Inherent Estimation Bias (IEB)**와 Potential Independence Priority (PIP) — 를 해결하기 위해, Inverse Propensity Score (IPS) 및 Doubly Robust Estimator 기반 counterfactual risk regularizer를 ESMM에 통합한 모델.

최종 학습 목적함수:

\mathcal{L}_{\text{ESCM}^2} = \underbrace{\mathcal{L}_{\text{CTR}}}_{\text{Empirical Risk}} + \lambda_c \underbrace{\mathcal{L}_{\text{CVR}}}_{\text{Counterfactual Risk}} + \lambda_g \underbrace{\mathcal{L}_{\text{CTCVR}}}_{\text{Global Risk}}

CVR의 estimand를 do-calculus로 재정의:

P(r=1 \mid do(o=1))

→ $X \to O$ 의존성을 제거하여 selection bias와 PIP를 동시에 해결.

직관적 이해

ESMM의 두 가지 문제

IEB (Inherent Estimation Bias): ESMM은 $\hat{R} = \hat{C}/\hat{O}$ 로 CVR을 추정하는데, Jensen’s inequality에 의해 $E[\hat{C}/\hat{O}] \geq E[\hat{C}]/E[\hat{O}]$ → 항상 과대추정.

PIP (Potential Independence Priority): ESMM의 causal graph에서 click→conversion 인과관계( $O \to R$ )가 누락 → CVR tower가 click 여부와 무관한 $P(r=1)$ 을 학습할 위험.

Causal Graph 비교

(a) ESMM              (b) Naive            (c) ESCM²
 X                     X                    X
 ↓                    ╱ ╲                     ╲
 O    R               O → R              do(O) → R
  ╲  ╱                     ╲                     ╲
   C                        C                     C

O→R 누락!           X→O: selection bias    do로 X→O 제거 +
                                           O→R 유지

ESCM²는 do-calculus를 통해 Figure 3(c)의 구조를 달성: click=0인 샘플에 대해 “만약 클릭했다면 전환했을 확률”이라는 counterfactual 질문에 답함.

두 가지 변형

ESCM²-IPS

CTR tower 출력을 propensity score로 활용하여 CVR loss를 역확률 가중:

\mathcal{R}_{\text{IPS}} = \frac{1}{|\mathcal{D}|} \sum_{(u,i) \in \mathcal{D}} \frac{o_{u,i} \cdot \delta(r_{u,i}, \hat{r}_{u,i})}{\hat{o}_{u,i}}

CTR이 정확하면 ( $\hat{o} = o$ ) unbiased CVR estimation 보장 (Theorem 2)
단점: high variance (낮은 CTR에서 weight 폭발)

ESCM²-DR

IPS에 imputation tower를 추가하여 variance 감소:

\mathcal{R}_{\text{DR}} = \frac{1}{|\mathcal{D}|} \sum_{(u,i) \in \mathcal{D}} \left[ \hat{\delta}_{u,i} + \frac{o_{u,i} \cdot (\delta_{u,i} - \hat{\delta}_{u,i})}{\hat{o}_{u,i}} \right]

$\hat{\delta}_{u,i}$ : imputation tower가 예측한 CVR error
Double robustness: imputation 또는 propensity 둘 중 하나만 정확해도 unbiased
추가 imputation loss: $\mathcal{R}_{\text{DR}}^{\text{imp}}$ 로 imputation tower 학습

아키텍처

       Shared Embedding Lookup Table
       ┌──────┬──────────┬──────────┐
       ↓      ↓          ↓          ↓
   CTR Tower  Imp Tower  CVR Tower
       ↓      ↓          ↓
      pCTR   δ̂ (imp)   pCVR ──── × ──── pCTR
       │      │          │                  ↓
       │      │          │               pCTCVR
       ↓      ↓          ↓                  ↓
   L_CTR   L_CVR(IPS/DR)              L_CTCVR
   ───────────── + ─────────── + ──────────
              L_ESCM² (최종)

구현 팁

Propensity clipping: $\hat{o}_{u,i}$ 가 매우 작을 때 weight 폭발 방지 → threshold 0.1로 clipping
Gradient truncation: $\mathcal{L}_{\text{CVR}}$ 에서 CTR tower(propensity)로의 gradient를 차단 → CTR 학습 보호
$\lambda_c$ 설정: 일반적으로 0.1~1.0 범위. 너무 크면 CTR 학습 방해 → CTCVR 성능 저하
$\lambda_g$ 설정: 1.0 이상 권장. CVR과 CTCVR 모두에 긍정적
DR 안정화: Alternative training으로 imputation tower와 CVR tower 번갈아 학습

이론적 보장

정리	내용	의미
Theorem 1	$\text{Bias}^{\text{ESMM}} > 0$	ESMM은 구조적으로 CVR 과대추정
Theorem 2	$\mathcal{R}_{\text{IPS}} = \mathcal{P}$ (CTR 정확 시)	IPS regularizer → unbiased CVR (IEB 해결)
Theorem 3	$\hat{r}^{\text{IPS}} \to P(r \mid do(o=1))$	IPS → counterfactual CVR 수렴 (PIP 해결)

참고 논문

wangESCM $^2$ EntireSpace2022 — ESCM² 원논문 (SIGIR 2022)
maEntireSpaceMultiTask2018 — ESMM 원논문 (SIGIR 2018)