Paste your prompt plus the files and docs you stuff into an LLM, and see what actually fills the context window, ranked by the biggest offenders, with a clear recommendation on what to cut first.
Every file, doc, and message you send shares one context window, and
your app re-sends most of it on every turn. A couple of large source files or a pasted API doc can quietly
eat the budget you needed for retrieved context and the answer itself. This analyzer counts each input with
the real o200k_base tokenizer (the GPT-4o family), ranks the biggest offenders, and tells you
what to drop first, the same idea as the Context Window Visualizer,
but file by file.
Counts are exact for the o200k_base tokenizer; other model families tokenize a little differently, so treat cross-model numbers as close estimates. Everything runs in your browser, your code and docs never leave the page.
Same idea, different lens. The visualizer splits context by role (system, history, docs, output). This one is file-oriented: paste the actual files you send and see which individual ones to cut.
The real o200k_base encoding used by the GPT-4o family, run locally. For Claude or Gemini the count is a close estimate, since those tokenizers aren’t public.
No. Tokenizing and ranking happen entirely in your browser, so nothing you paste is sent anywhere.
Count a single block with the Token Counter, trim retrieval with RAG chunking, or read what a context window is.