AI Explained · Tool

Context Budget Analyzer

Paste your prompt plus the files and docs you stuff into an LLM, and see what actually fills the context window, ranked by the biggest offenders, with a clear recommendation on what to cut first.

0%

Why context fills up faster than you think

Every file, doc, and message you send shares one context window, and your app re-sends most of it on every turn. A couple of large source files or a pasted API doc can quietly eat the budget you needed for retrieved context and the answer itself. This analyzer counts each input with the real o200k_base tokenizer (the GPT-4o family), ranks the biggest offenders, and tells you what to drop first, the same idea as the Context Window Visualizer, but file by file.

How to use it

Counts are exact for the o200k_base tokenizer; other model families tokenize a little differently, so treat cross-model numbers as close estimates. Everything runs in your browser, your code and docs never leave the page.

FAQ

Is this different from the Context Window Visualizer?

Same idea, different lens. The visualizer splits context by role (system, history, docs, output). This one is file-oriented: paste the actual files you send and see which individual ones to cut.

Which tokenizer does it use?

The real o200k_base encoding used by the GPT-4o family, run locally. For Claude or Gemini the count is a close estimate, since those tokenizers aren’t public.

Is my code uploaded?

No. Tokenizing and ranking happen entirely in your browser, so nothing you paste is sent anywhere.

Related

Count a single block with the Token Counter, trim retrieval with RAG chunking, or read what a context window is.