Paste a document, set the chunk size and overlap, and see exactly how it splits for retrieval, with the overlap between neighbours highlighted. Runs in your browser.
In retrieval-augmented generation, each piece of a document becomes one embedding in a vector database. Retrieval returns whole chunks, so the chunk is the unit of meaning. Too large and a single embedding has to represent several ideas, so matches get fuzzy. Too small and a chunk loses the context around it. Overlap repeats a little text between neighbours so a sentence split across a boundary still shows up intact in at least one chunk.
Token mode uses the real o200k_base tokenizer (the GPT-4o family), so the counts match what
an embedding model on that encoding would see. Character mode is tokenizer-agnostic. Everything runs
locally, your document never leaves the browser.
Embedding models have a token limit, so tokens are the honest unit. Characters are a convenient proxy (very roughly four characters per token for English). Use token mode when you need to respect a model’s real limit.
A common starting point is 10–20% of the chunk size. Enough to keep boundary sentences intact, not so much that you bloat the index with duplicated text.
No. Chunking runs entirely in your browser, so the text stays on your device and the tool works offline.
Learn what RAG is, how embeddings and a vector database fit in, or count tokens with the Token Counter.