๐Ÿ“

What Is a Vector?

A bundle of numbers with direction and magnitude โ€” higher dimensions carry meaning

A vector is an ordered list of numbers.

2D vector โ€” an arrow on a plane

[3, 4] means move 3 in x, 4 in y. Like saying "3km east, 4km north" on a map.

Distance between two vectors tells you "how far apart," angle tells you "how similar in direction." This angle-based similarity is cosine similarity.

3D vector โ€” a point in space

[3, 4, 2] is a point in 3D space. Just x, y plus a z-axis (height). Distance and angle calculations work the same as 2D.

Up to here, humans can see it directly. 2D on paper, 3D with tools like Three.js that let you rotate the view.

High-dimensional vectors โ€” the space of meaning

[0.023, -0.15, 0.41, ..., -0.33] โ€” 1536 numbers. You can't see this anymore.

But the math doesn't change. The formula for distance between two points in 2D works identically in 1536D. More dimensions, same principle: close means similar, far means different.

Why 1536 dimensions in AI embeddings? Simple โ€” 2-3 dimensions can express "cat and dog are similar" but not nuances like "Persian cats are luxurious while stray cats are rugged."

Visualizing high dimensions

You can't see 1536D directly, but you can "compress" it to 2D/3D.

t-SNE: Projects to 2D keeping nearby points close and distant points far. Great for spotting clusters.

UMAP: Faster than t-SNE, better at preserving global structure. Good for large datasets.

PCA: Keeps the 2-3 axes with the most information, discards the rest. Fastest but loses the most information.

If the compressed visualization shows "similar images clustered together," the embedding is working.

Why it matters for RecSys

Almost every RecSys technique boils down to "computing distance between user vectors and item vectors." Understand vectors and you'll see that Matrix Factorization, Two-Tower, and embedding-based search all share the same foundation.

2D ๋ฒกํ„ฐ โ€” ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ

ํ™”์‚ดํ‘œ ๋(โ—)์„ ๋“œ๋ž˜๊ทธํ•ด์„œ ๋ฒกํ„ฐ๋ฅผ ์›€์ง์—ฌ๋ณด์„ธ์š”. ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„๊ฐ€ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋ณ€ํ•ฉ๋‹ˆ๋‹ค.

A = [3, 4]
B = [4, 1]
์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ 0.00

3D ๋ฒกํ„ฐ โ€” ํšŒ์ „ํ•ด์„œ ๋ณด๊ธฐ

๋งˆ์šฐ์Šค ๋“œ๋ž˜๊ทธ๋กœ ๊ณต๊ฐ„์„ ํšŒ์ „, ์Šคํฌ๋กค๋กœ ์คŒ. 3๊ฐœ ๋ฒกํ„ฐ์˜ ๊ด€๊ณ„๋ฅผ 3์ฐจ์›์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ณ ์–‘์ด [0.8, 0.6, 0.3]
๊ฐ•์•„์ง€ [0.7, 0.5, 0.4]
์ž๋™์ฐจ [-0.3, 0.9, -0.2]
๊ณ ์–‘์ดโ†”๊ฐ•์•„์ง€: 0.97 | ๊ณ ์–‘์ดโ†”์ž๋™์ฐจ: 0.12

๊ณ ์ฐจ์› โ†’ 2D ์••์ถ• (์‹œ๋ฎฌ๋ ˆ์ด์…˜)

1536์ฐจ์› ๋ฒกํ„ฐ๋ฅผ 2D๋กœ ์••์ถ•ํ•˜๋ฉด ์ด๋Ÿฐ ๋А๋‚Œ์ž…๋‹ˆ๋‹ค. ์˜๋ฏธ๊ฐ€ ๋น„์Šทํ•œ ๋‹จ์–ด๋ผ๋ฆฌ ๋ญ‰์นฉ๋‹ˆ๋‹ค.

๋™๋ฌผ
ํƒˆ๊ฒƒ
์Œ์‹
๊ฐ์ •

How It Works

1

List numbers to get a vector (e.g., [3, 4])

2

Dimension = count of numbers (2D, 3D, 1536D...)

3

Measure similarity via distance/angle (cosine similarity)

4

Compress high dimensions to 2D/3D with t-SNE/UMAP

Pros

  • Math works identically regardless of dimension โ€” understand 2D, same principles apply to 1536D
  • Cosine similarity alone can compare text, images, and user preferences

Cons

  • Higher dimensions lose intuitive understanding โ€” visualization always loses information
  • Curse of dimensionality โ€” in high dimensions all points tend to be similarly distant

Use Cases

2D/3D โ€” physics simulation, games, maps High-dim โ€” AI embeddings, recommendation, search Visualization โ€” embedding quality check, cluster analysis