🎭

BERT4Rec

Masked Language Model for recommendations — learning preferences by filling blanks

SASRec uses left-to-right unidirectional attention. BERT4Rec (2019, Sun et al.) goes bidirectional.

In sequence [A, B, C, D, E], replace C with [MASK] and predict it using context from A, B, D, and E. Exactly what BERT does with text.

SASRec: A → B → C → ? (predict right side only)
BERT4Rec: A → ? → C → D (use both sides)

Bidirectional seems intuitively stronger, but in actual serving you need "next item prediction," so at inference time you mask the last position.

Training masks random positions; serving predicts the last position. This gap can affect performance. Recent variants address this discrepancy.

How It Works

Replace random positions with [MASK] in user behavior sequence

Predict masked items with Bidirectional Transformer

Bidirectional context is reflected in representations

At serving time, mask last position to predict next item

Recover missing preferred items from user history Next-click prediction in e-commerce sessions