AI Explained · Tool

RAG Chunking Visualizer

Paste a document, set the chunk size and overlap, and see exactly how it splits for retrieval, with the overlap between neighbours highlighted. Runs in your browser.

Measure in
Strategy
Document

Why chunking decides whether RAG works

In retrieval-augmented generation, each piece of a document becomes one embedding in a vector database. Retrieval returns whole chunks, so the chunk is the unit of meaning. Too large and a single embedding has to represent several ideas, so matches get fuzzy. Too small and a chunk loses the context around it. Overlap repeats a little text between neighbours so a sentence split across a boundary still shows up intact in at least one chunk.

The two common strategies

Token mode uses the real o200k_base tokenizer (the GPT-4o family), so the counts match what an embedding model on that encoding would see. Character mode is tokenizer-agnostic. Everything runs locally, your document never leaves the browser.

FAQ

Should I chunk by characters or tokens?

Embedding models have a token limit, so tokens are the honest unit. Characters are a convenient proxy (very roughly four characters per token for English). Use token mode when you need to respect a model’s real limit.

How much overlap should I use?

A common starting point is 10–20% of the chunk size. Enough to keep boundary sentences intact, not so much that you bloat the index with duplicated text.

Is my document uploaded?

No. Chunking runs entirely in your browser, so the text stays on your device and the tool works offline.

Related

Learn what RAG is, how embeddings and a vector database fit in, or count tokens with the Token Counter.