Code of the Day
IntermediatePrompt Crafting

Tokens and context windows

Understand what tokens are, why context windows matter, and how to work within them efficiently rather than fighting their limits.

Using AIIntermediate10 min read
Recommended first
By the end of this lesson you will be able to:
  • Define tokens and estimate the token count of a piece of text
  • Explain what a context window is and what happens when it fills up
  • Apply practical strategies for staying within context limits
  • Write prompts that respect token budgets without sacrificing clarity

Every AI interaction has a physical limit you cannot negotiate: the context window. Understanding what that means in practice changes how you structure long tasks, what you paste into a conversation, and when to start fresh rather than continuing.

What tokens are

Language models don't read text the way humans do — character by character or word by word. They work with tokens: chunks of text that may be as short as a single character or as long as several characters, depending on how common the sequence is in the training data.

Common English words are usually a single token. Rarer words, technical terms, and identifiers often split into multiple tokens. Numbers, punctuation, and whitespace each consume tokens. As a rough working estimate: 100 tokens ≈ 75 words, or about a dense paragraph.

Some concrete examples:

  • "function" → 1 token
  • "concatenation" → 3–4 tokens
  • "supercalifragilisticexpialidocious" → 8–10 tokens
  • A 500-line Python file → roughly 2,000–4,000 tokens, depending on complexity

Why does this matter? Because cost and speed scale with token count, and because the total number of tokens in a conversation is bounded by the context window.

Most AI providers charge by the token. A prompt with a 50-line code snippet costs much less than one with a 500-line file, even if only 20 of those 500 lines are relevant to the question. Getting precise about what to include is not just a technical concern — it is also an economic one.

The context window

The context window is the total amount of text the model can "see" at once. This includes your entire conversation: every message you've sent, every response the model has given, and any system prompt set by the platform.

Current models have context windows ranging from roughly 8,000 to 200,000 tokens. That sounds large until you consider:

  • A typical conversation that iterates on a solution across 10 messages might consume 3,000–5,000 tokens.
  • Pasting an entire codebase (even a small one) can consume tens of thousands of tokens.
  • Every back-and-forth message adds more.

What happens when the window fills

When a conversation exceeds the context window, something has to give. Different systems handle this differently, but the two most common approaches are:

  1. Truncation: older messages are dropped. The model no longer has access to the beginning of the conversation — including, possibly, your original task description.
  2. Summarisation: a summary of older messages replaces them. The model has a compressed version of what came before, not the original text.

Both degrade quality. The model starts making decisions that contradict earlier instructions it no longer has access to. Responses become less coherent with the stated requirements.

Practical strategies

Paste targeted excerpts, not whole files. If you're asking about a specific function, paste that function plus the type signatures of anything it calls — not the entire file. The model does not need to read your entire codebase to answer a question about one function.

Be concise in your prompts. Clarity and conciseness are not in tension. "Explain what this function does" is as clear as "I was wondering if you could take a look at this function and maybe give me a sense of what it is supposed to do." The second version uses four times as many tokens and contains no additional information.

Ask for concise output. When you don't need an exhaustive explanation, say so: "Keep your response under 200 words." This saves tokens on the response side and keeps the conversation from inflating unnecessarily.

Start fresh for new tasks. A conversation is not a single continuous session for your whole working day. When one task is complete and you're moving to a different one, start a new conversation. Carrying forward the context of an unrelated task wastes tokens on irrelevant material and can confuse the model about what the current objective is.

Front-load critical information. Research on model attention suggests that information near the beginning and end of the context window receives more weight than information buried in the middle. Put your key constraints and requirements near the top of a long prompt, not in the middle of a wall of text.

A long conversation is not the same as a well-informed model. If you've been iterating for 20 exchanges, the context is full of every detour and false start. A new conversation built on a single, clean, comprehensive prompt that incorporates what you've learned often produces better results than continuing to layer corrections onto a cluttered context.

Token budgeting in practice

Think of the context window as a working memory that the model shares with you. Your job is to fill it with signal, not noise. Before pasting anything long, ask: is this relevant to what I'm about to ask? If the answer is "mostly yes, but there's a lot I don't need," trim it. The 30 seconds spent extracting the relevant function or the relevant section of a document often produces a better response than the original would have.

This is not about being stingy — it is about precision. A prompt that contains exactly what the model needs to answer the question produces better results than one that contains everything you have and hopes the model finds the relevant parts.

Where to go next

You've now completed the Prompt Crafting module. The lab brings all these techniques together in a single structured challenge: you'll start with a deliberately vague prompt and iterate through all four framework components, role assignment, chain-of-thought, and constraint refinement — comparing your first attempt to your final result.

Knowledge check

  1. 1.
    A developer is asking an AI to help debug a specific function in a 1,200-line file. What is the most token-efficient approach?
  2. 2.
    Which of the following are valid consequences of a conversation exceeding its context window? Select all that apply.
  3. 3.
    Continuing a long conversation is always better than starting a new one, because the model has more context about your project.
Finished reading? Mark it complete to track your progress.

On this page