AI Explained · Tool

Context Window Visualizer

See how your system prompt, conversation history, retrieved docs, and reserved output fill a model's context window, and how much room is actually left. Runs in your browser.

0%

Where did all my context go?

A model’s context window is shared by everything in a request, your system prompt, the conversation history the app keeps re-sending, any retrieved documents, and the reply itself. This tool estimates the tokens in each part and shows them stacked against a model’s budget, so you can see what’s actually eating the space and how much is left to answer.

How to read it

  • Pick a context budget (a size tier, or enter a custom number for your exact model).
  • Paste each part of your prompt; the bar fills proportionally and the percentage updates live.
  • If the total turns red, you’re over budget, the app would truncate or reject the request. Trim history, retrieve fewer chunks, or reserve less output.

Token counts are estimates (~4 characters per token). Real tokenization varies by model and content, code and non-English text tokenize differently. Size tiers are dated June 2026; verify your model’s exact limit with its provider.

FAQ

Why is the token count only an estimate?

Every model family tokenizes differently, and the exact split depends on the text. The ~4-chars-per-token rule is a solid ballpark for English prose; treat it as a planning estimate, not an exact bill.

Why reserve space for output?

The reply is generated into the same window as the input. If you fill the budget with input, there’s no room left to answer, so apps reserve a slice for the response.

Is anything uploaded?

No, estimation runs entirely in your browser, so prompts and documents stay local and it works offline.

Related

Read what a context window is and how RAG keeps it lean, or browse AI Explained.