Count the tokens in any text with the real OpenAI tokenizers and see exactly how it splits, token by token. Runs entirely in your browser; nothing you paste is uploaded.
Models don’t read characters or words, they read tokens: chunks of text a
tokenizer learned to split on. A common word is often one token; a
rarer or longer word splits into several (“tokenization” → token +
ization). Punctuation, leading spaces, and casing all matter. Token count is what
providers bill on and what fills a model’s context window,
so counting them is the difference between a request that fits and one that gets truncated.
o200k_base, the newest OpenAI encoding.text-embedding-3 models) use cl100k_base.The two OpenAI counts are exact, they run the same byte-pair encodings OpenAI uses, right here in your browser. The “other” estimate is a planning ballpark only; verify against your provider for billing.
Yes. This loads the actual o200k_base and cl100k_base tokenizers and encodes your text locally, so the count matches what OpenAI’s API would charge for that text.
Anthropic and Google don’t publish their tokenizers, so an exact in-browser count isn’t possible without calling their APIs. Rather than invent a number, we show an honest character-based estimate and label it as one.
No. The tokenizer runs entirely in your browser, so whatever you paste stays on your device and the tool works offline.
Most tokens include their leading space, so token and token can be different tokens. Casing and punctuation shift the split too, which is why two similar strings can have different counts.
See what a context window is, plan a prompt with the Context Window Visualizer, or browse AI Explained.