Models don't remember anything, memory is engineered around them. Step through a conversation and watch a sliding window, a summary buffer, and retrieval keep or drop facts, and see exactly what the model sees each turn.
A model is stateless. It doesn’t remember your last message, or the last thousand, every turn, your app re-sends whatever history fits and the model reads it fresh. So “memory” isn’t something the model has; it’s something you engineer by deciding what to put back into the context window each time. This simulator shows three common strategies and, crucially, what each one forgets.
Pin a message as a fact (the ☆) to make it eligible for retrieval. Real systems match facts by meaning using embeddings; here we approximate with keyword overlap to keep it transparent. The summary block stands for “these turns were compressed”, in production an LLM writes it. Everything runs in your browser.
Correct. Each API call is independent. Any continuity you see is your application re-sending history (or a summary, or retrieved facts) every single turn. Take that away and the model is a blank slate.
Because that message scrolled out of the last N. A pure sliding window has no idea a dropped message mattered. Retrieval fixes exactly this by storing facts and pulling them back when relevant.
No. The simulation is entirely client-side, so your conversation stays on your device.
Read what agent memory is, how RAG and embeddings power retrieval, or size a window with the Context Window Visualizer.