What is agent memory?, why AI remembers or forgets

The thing nobody tells you: the model is stateless

A language model has no memory between calls. Each request is processed from scratch; the model doesn’t carry anything over from the last one. So when an assistant seems to “remember” your name or last week’s conversation, that isn’t the model recalling, it’s the application storing information and feeding it back in. Whether an assistant remembers depends on whether the product has a memory feature, not on the model.

Three layers people call “memory”

Context (working memory). The context window, everything in the current request. It’s short-term and ephemeral; when the request ends, it’s gone.
Conversation memory. The app re-sends previous turns each request so the chat feels continuous. This still lives inside the context window, which is why a very long chat eventually “forgets” the start: the oldest turns get trimmed to fit.
Long-term memory. Information saved outside any single request, in a database or vector store, and pulled back in when relevant. This is the part that persists across sessions, and it’s a feature the application implements.

How long-term memory actually works

The common pattern is RAG pointed at your own history. The app extracts things worth keeping (“the user prefers metric units”), stores them, often as embeddings in a vector database, and at the start of a later session retrieves the memories relevant to what you’re doing now and adds them to the context. The model then answers as if it “remembered,” when really it was handed the right notes at the right moment.

Kinds of memory worth distinguishing

Episodic: what happened in past interactions (“last time you asked about X”).
Semantic: durable facts and preferences (“you work in Miami”).
Procedural: how to carry out a task the way you like it done.

Is memory part of the context window?

No, but they meet there. Memory is storage; the context window is the working set. Memory only influences an answer when something retrieves it and places it into the context for that request. Confusing the two is the root of most “why did it forget?” questions: if a fact wasn’t in the context for this call, the model couldn’t use it.

Where it goes wrong

Memory bloat. Saving everything makes retrieval noisy and context expensive, be selective.
Stale memories. Old facts that are no longer true get retrieved and mislead the model.
Privacy. Long-term memory is a record of the user, it needs the same care as any stored personal data.

FAQ

Why does the assistant forget the start of a long chat?

The conversation outgrew the context window, so the app trimmed the oldest turns to fit. The model only sees what was sent this request.

Does the model learn from my conversations?

Not in real time. “Memory” is the app storing and re-injecting information, not the model updating its weights. Training is a separate, offline process.

Why does one product remember and another doesn’t?

Because memory is a feature the product builds. Same underlying models, different memory implementations.

Memory is retrieved into the context window via RAG. More in AI Explained.