Context Budget Analyzer, what's eating your context window

Why context fills up faster than you think

Every file, doc, and message you send shares one context window, and your app re-sends most of it on every turn. A couple of large source files or a pasted API doc can quietly eat the budget you needed for retrieved context and the answer itself. This analyzer counts each input with the real o200k_base tokenizer (the GPT-4o family), ranks the biggest offenders, and tells you what to drop first, the same idea as the Context Window Visualizer, but file by file.

How to use it

Pick the budget for your model (a tier, or a custom number).
Add each input, your system prompt, the source files, the pasted docs, with a name so you can spot it.
Read the offenders list: the few inputs doing most of the damage, and how much you’d free by cutting each.

Counts are exact for the o200k_base tokenizer; other model families tokenize a little differently, so treat cross-model numbers as close estimates. Everything runs in your browser, your code and docs never leave the page.

FAQ

Is this different from the Context Window Visualizer?

Same idea, different lens. The visualizer splits context by role (system, history, docs, output). This one is file-oriented: paste the actual files you send and see which individual ones to cut.

Which tokenizer does it use?

The real o200k_base encoding used by the GPT-4o family, run locally. For Claude or Gemini the count is a close estimate, since those tokenizers aren’t public.

Is my code uploaded?

No. Tokenizing and ranking happen entirely in your browser, so nothing you paste is sent anywhere.

Count a single block with the Token Counter, trim retrieval with RAG chunking, or read what a context window is.

Why context fills up faster than you think

How to use it

FAQ

Related