Token Counter & Tokenizer, count GPT tokens online

What a token actually is

Models don’t read characters or words, they read tokens: chunks of text a tokenizer learned to split on. A common word is often one token; a rarer or longer word splits into several (“tokenization” → token + ization). Punctuation, leading spaces, and casing all matter. Token count is what providers bill on and what fills a model’s context window, so counting them is the difference between a request that fits and one that gets truncated.

Which tokenizer should I pick?

GPT-4o · GPT-4.1 · o-series use o200k_base, the newest OpenAI encoding.
GPT-4 · GPT-3.5 (and the text-embedding-3 models) use cl100k_base.
Claude, Gemini & others use their own tokenizers, which are not published. For those, this tool shows a character-based estimate (about four characters per token), clearly labelled, never a fake exact count.

The two OpenAI counts are exact, they run the same byte-pair encodings OpenAI uses, right here in your browser. The “other” estimate is a planning ballpark only; verify against your provider for billing.

FAQ

Are the OpenAI token counts exact?

Yes. This loads the actual o200k_base and cl100k_base tokenizers and encodes your text locally, so the count matches what OpenAI’s API would charge for that text.

Why no exact Claude or Gemini count?

Anthropic and Google don’t publish their tokenizers, so an exact in-browser count isn’t possible without calling their APIs. Rather than invent a number, we show an honest character-based estimate and label it as one.

Is my text uploaded?

No. The tokenizer runs entirely in your browser, so whatever you paste stays on your device and the tool works offline.

Why does whitespace change the count?

Most tokens include their leading space, so token and token can be different tokens. Casing and punctuation shift the split too, which is why two similar strings can have different counts.

See what a context window is, plan a prompt with the Context Window Visualizer, or browse AI Explained.

What a token actually is

Which tokenizer should I pick?

FAQ

Related