The conversation model
AI chat works through a message history fed to the model each time — understanding that explains everything from context limits to session memory.
- Explain how a chat-based LLM uses message history to generate responses
- Define "context window" and describe its practical implications
- Distinguish between system prompt, user message, and assistant message
- Explain why AI does not remember you between sessions
When you type a message in an AI chat interface and press send, it probably feels like a conversation — the AI remembers what you said earlier, builds on it, and maintains a thread. That feeling is accurate in one narrow sense and misleading in a more important one. Understanding exactly what is happening mechanically will explain a cluster of behaviours that otherwise seem random: why the AI "forgets" things late in a long conversation, why it doesn't remember you from yesterday, and what the system prompt is and why it matters.
The message history is the memory
An LLM has no persistent memory between calls. Every time you send a message, the chat interface assembles a complete transcript of the entire conversation so far — every message you sent, every response the model gave — and submits the whole thing to the model as a single, long input. The model reads the full transcript and generates the next response.
This is why the model can "remember" what you said three messages ago: because that message is still in the input. It is not stored anywhere inside the model; it is in the text that was just sent to it.
The consequence is that the conversation only "exists" within the interface that is maintaining and sending that transcript. Close the tab, switch to a new conversation, or open a different app — and the model starts with a blank transcript. It has no idea who you are or what you discussed before.
Some AI products are adding "memory" features that summarise past conversations and prepend that summary to new sessions. This is a product feature built on top of the model, not a capability of the model itself. The mechanism is the same — text sent in the input — just automated.
The context window
The transcript submitted to the model cannot be arbitrarily long. Every model has a context window — a maximum number of tokens (recall: roughly three-quarters of a word each) that can be in the input at once. This includes both your messages and the model's previous responses.
Practical implications:
- Very long conversations degrade. As the total token count climbs toward the limit, the interface may start dropping older messages from the transcript. The model appears to "forget" something you established early in the session — because that exchange is no longer in the input.
- Pasting large documents can fill the window fast. If you paste a 50-page document into the chat and then ask questions, you may have used most of your context budget before writing a single question.
- Shorter, focused conversations work better. For a coding task, starting a fresh conversation per task (rather than a single endless thread) often gets better results because the model sees a clean, focused transcript.
Context windows have grown dramatically over recent years — from a few thousand tokens to hundreds of thousands. But the fundamental limit is still there, and the degradation behaviour at the edges is still real.
The three message roles
When the transcript is assembled, messages are tagged with a role. The three roles you will encounter:
- System: Set once at the start, usually by the application (not the user). The system message defines the AI's persona, constraints, and any background context that should apply to the whole conversation. When you use Claude.ai, a pre-written system prompt is already there before you type your first word. When a company deploys a customer-service bot, the system message instructs it to stay on-topic and not discuss competitors.
- User: Your messages.
- Assistant: The model's responses.
The transcript the model receives looks like:
[system] You are a helpful coding assistant. You answer concisely.
[user] How do I reverse a list in Python?
[assistant] Use list.reverse() to reverse in place, or reversed(list) to get
a new reversed iterator.
[user] What about a string?The model reads this in order and generates the next assistant message. It has no privileged access to "intent" or "you the person" — it sees text with roles.
You usually cannot see the system prompt in a consumer interface — it is set by the product. This is worth knowing because it means AI behaviour in one product can differ meaningfully from another, even using the same underlying model.
Check your understanding
Knowledge check
- 1.How does a chat-based LLM "remember" what you said three messages ago?
- 2.By default, an AI chat model will remember your previous conversations when you start a new session.
- 3.Which message role is typically set by the application rather than the user?
Where to go next
You know how the conversation works mechanically. Next: your first prompt — how to actually structure a message that gets you what you need, starting with the simplest form and building from there.