What are embeddings?, vectors, similarity & semantic search

The core idea

An embedding is a vector: a fixed-length list of numbers, that represents the meaning of a piece of content. An embedding model reads your text (or image, or audio) and outputs something like 768 or 1,536 numbers. The numbers themselves aren’t meant to be read; what matters is position: content with similar meaning produces vectors that land close together, and unrelated content lands far apart. “Dog” and “puppy” are neighbors; “dog” and “quarterly tax form” are nowhere near each other. Meaning becomes geometry.

Why that’s powerful

Keyword search matches characters: search “car” and you miss “automobile,” “vehicle,” and “sedan.” Embeddings match meaning, so a search for “car” finds all of them, that’s semantic search. The same trick underlies a lot of modern AI plumbing:

Semantic search: find by what you meant, not the exact words.
RAG: retrieve the document chunks most relevant to a question and feed them to the model.
Recommendations: surface items similar to what someone liked.
Clustering & classification: group similar content, or label it.
Deduplication: spot near-identical text even when the wording differs.

How similarity is measured

Once content is vectors, “similar” becomes a distance you can compute. The most common measure is cosine similarity: the cosine of the angle between two vectors (1.0 means identical direction, 0 means unrelated). Dot product and Euclidean distance are also used. To find the best matches for a query, you embed the query the same way and look for its nearest neighbors in the vector space.

Under the hood How cosine similarity actually works optional

Once two things are vectors, “similar” is just an angle. Cosine similarity is the cosine of the angle between them: 1.0 means they point the exact same way, 0 means they’re unrelated (at a right angle). You get it from the dot product divided by the two lengths:

cos(A, B) = (A · B) / (|A| |B|) where A · B = a₁b₁ + a₂b₂ + … + aₙbₙ

The trick is that it measures direction, not size. A short note and a long article on the same topic still point the same way, so they score as similar, which is exactly what you want for meaning. Picture it in 2D: “dog” at [2, 1] and “puppy” at [3, 1.5] lie along the same line (cosine ≈ 1.0), while “tax form” at [-1, 2] sits at a wide angle (cosine near 0). Nearest-neighbor search is just that, one score per candidate, keep the highest.

Where the vectors live

For a handful of items you can compare vectors in memory. At scale, thousands or millions of chunks, you store them in a vector database built for fast approximate nearest-neighbor search. Indexing once and querying many times is exactly the pattern RAG uses.

Practical things that trip people up

Use the same model for indexing and querying. Vectors from different embedding models aren’t comparable, they live in different spaces.
Dimensions are a tradeoff. More dimensions can capture more nuance but cost more to store and search; pick what your use case needs.
Chunking matters. Embedding a whole document blurs its meaning into one vector; embedding well-sized passages keeps retrieval sharp.
Embeddings aren’t the model’s “knowledge.” They’re a representation of this content for comparison, not a store of facts the model reasons over.

FAQ

What is a “dimension” in an embedding?

Each number in the vector is one dimension. A 1,536-dimensional embedding is a point in 1,536-dimensional space, impossible to picture, but the math of “how close are these two points” works exactly like it does in 2D or 3D.

Can I compare embeddings from two different models?

No. Each model defines its own space, so a vector from model A is meaningless next to one from model B. Always embed everything you’ll compare with the same model.

Are embeddings only for text?

No, images, audio, and code can all be embedded, and multimodal models can put text and images in the same space, so you can search images with a text query.

Embeddings power RAG and fill the context window. More in AI Explained.