🤖

RecSys and Search in the LLM Era — What Actually Changed

How YouTube, Spotify, Netflix, and LinkedIn applied LLMs to recommendations and search

Based on Eugene Yan's analysis, here are four axes along which LLMs are changing recommendations and search.

1. LLM/Multimodal-Enhanced Architectures

Traditional RecSys runs on item IDs. User A viewed items 123, 456, so recommend 789. The problem: no information for new items (cold start) or unpopular items (long tail).

LLMs and multimodal models fix this by understanding item text, images, and audio.

YouTube's Semantic IDs — Content-derived IDs instead of hash-based ones. Transformer creates video embeddings, RQ-VAE converts them to integer Semantic IDs. N-gram/SentencePiece approaches worked particularly well for cold start.

Kuaishou's M3CSR — Merges visual (ResNet), text (Sentence-BERT), audio (VGGish) embeddings, K-means clusters them into learnable IDs. A/B test: clicks +3.4%, likes +3.0%, follows +3.1%.

Google's CALRec — Fine-tuned PaLM-2 XXS for recommendations via text prompts. Two-stage: multi-category pretraining → category-specific fine-tuning.

Meta's EmbSum — Summarizes user interests and candidate items separately using T5-small and Mixtral-8x22B, then matches them.

2. LLM-Powered Data Generation

Using LLMs to create data for recommendation systems rather than doing recommendations directly.

Bing — GPT-4 generates webpage titles/summaries. Fine-tuned Mistral-7B on 2M pages. Clickbait -31%, duplicate content -76%, authoritative content +18%.

Indeed — Fine-tuned GPT-3.5 to filter bad job matches (eBadMatch). Invitation emails -17.68%, unsubscribes -4.97%, applications +4.13%.

Spotify — Introduced exploratory query recommendations. LLM-generated queries ranked with personalized embeddings. Exploratory queries +9%.

3. Scaling Laws, Transfer Learning, Distillation, LoRA

Core LLM techniques now applied to RecSys.

Scaling Laws — Decoder-only Transformers from 98.3K to 0.8B params. Bigger models need less data for good performance.

YouTube Knowledge Distillation — Teacher model (2-4x larger) knowledge transferred to student. +0.4% improvement (significant in RecSys).

DLLM2Rec — Distills LLM recommendation knowledge to lightweight models. Inference: 3-6 hours → 1.6-1.8 seconds. Average performance +47.97%.

Alibaba MLoRA — Domain-specific LoRA for CTR prediction. CTR +1.49%, conversion +3.37%.

4. Unified Search & Recommendation

LinkedIn 360Brew — Single 150B-param model handles 30+ ranking tasks. Prompt engineering instead of feature engineering. Matches or beats specialized models.

Netflix UniCoRn — Unified model for search and recommendation. Recommendations +10%, search +7%.

Etsy Unified Embeddings — Transformer + T5 text + graph embeddings. Graph embeddings contributed most (+15%). Conversion +2.63%.

What to Take Away

The pattern: rather than using LLMs as recommendation models directly, (1) generate data, (2) distill knowledge to lightweight models, or (3) add multimodal understanding. The most immediately applicable is LLM-powered data generation — metadata enrichment, query generation, quality filtering work without changing existing pipelines.

How It Works

Solve cold start with LLM/multimodal — Semantic IDs, multimodal embedding + clustering

Generate data with LLMs — metadata enrichment, query generation, quality filtering

Distill LLM knowledge to lightweight models — inference time reduced by orders of magnitude

Domain-specific fine-tuning with LoRA — shared backbone + domain adapters

Unified search and recommendation — single model handles query-based + history-based tasks

Pros

✓ Solves cold start/long tail — multimodal understanding enables new item recommendations
✓ Data quality improvement — LLMs auto-generate metadata, queries, filters
✓ Unified architecture — merging search/recommendations reduces maintenance cost
✓ Distillation makes it practical — compress LLM-level performance to servable size

Cons

✗ Latency — direct LLM serving is still too heavy for real-time recommendations
✗ Unified models don't always beat specialized ones — BM25, SASRec still strong in some areas
✗ GPU infrastructure costs — significant compute needed for training and data generation

Use Cases

E-commerce search enhancement — auto-generate product metadata/queries with LLMs for better recall Content platforms — multimodal embeddings enable immediate recommendations for new content Job matching — filter low-quality matches with LLMs to improve conversion