What are embeddings?
An embedding turns a piece of content into a list of numbers that captures its meaning — so that things which mean similar things sit close together. That one idea powers search, RAG, and recommendations.
The core idea
An embedding is a vector — a fixed-length list of numbers — that represents the meaning of a piece of content. An embedding model reads your text (or image, or audio) and outputs something like 768 or 1,536 numbers. The numbers themselves aren’t meant to be read; what matters is position: content with similar meaning produces vectors that land close together, and unrelated content lands far apart. “Dog” and “puppy” are neighbors; “dog” and “quarterly tax form” are nowhere near each other. Meaning becomes geometry.
Why that’s powerful
Keyword search matches characters: search “car” and you miss “automobile,” “vehicle,” and “sedan.” Embeddings match meaning, so a search for “car” finds all of them — that’s semantic search. The same trick underlies a lot of modern AI plumbing:
- Semantic search — find by what you meant, not the exact words.
- RAG — retrieve the document chunks most relevant to a question and feed them to the model.
- Recommendations — surface items similar to what someone liked.
- Clustering & classification — group similar content, or label it.
- Deduplication — spot near-identical text even when the wording differs.
How similarity is measured
Once content is vectors, “similar” becomes a distance you can compute. The most common measure is cosine similarity — the angle between two vectors (1.0 means identical direction, 0 means unrelated). Dot product and Euclidean distance are also used. To find the best matches for a query, you embed the query the same way and look for its nearest neighbors in the vector space.
Where the vectors live
For a handful of items you can compare vectors in memory. At scale — thousands or millions of chunks — you store them in a vector database built for fast approximate nearest-neighbor search. Indexing once and querying many times is exactly the pattern RAG uses.
Practical things that trip people up
- Use the same model for indexing and querying. Vectors from different embedding models aren’t comparable — they live in different spaces.
- Dimensions are a tradeoff. More dimensions can capture more nuance but cost more to store and search; pick what your use case needs.
- Chunking matters. Embedding a whole document blurs its meaning into one vector; embedding well-sized passages keeps retrieval sharp.
- Embeddings aren’t the model’s “knowledge.” They’re a representation of this content for comparison — not a store of facts the model reasons over.
FAQ
What is a “dimension” in an embedding?
Each number in the vector is one dimension. A 1,536-dimensional embedding is a point in 1,536-dimensional space — impossible to picture, but the math of “how close are these two points” works exactly like it does in 2D or 3D.
Can I compare embeddings from two different models?
No. Each model defines its own space, so a vector from model A is meaningless next to one from model B. Always embed everything you’ll compare with the same model.
Are embeddings only for text?
No — images, audio, and code can all be embedded, and multimodal models can put text and images in the same space, so you can search images with a text query.
Related
Embeddings power RAG and fill the context window. More in AI Explained.