Course 1 · AI Foundations · Lesson 04
Tokens, Context & Why AI Forgets
A model reads and writes in chunks called tokens — and it can only hold a limited number at once. When that limit fills, the oldest fall away. That, not the tokens themselves, is why AI “forgets.”
The one mental model
Picture a fixed-size working desk. Everything on it, the model can see. As the conversation grows the desk fills, and the oldest notes slide off the back. The model isn’t ignoring you — that text is simply no longer on the desk.
Key terms
Token
A chunk of text — sometimes a whole word, sometimes a piece of one, sometimes just a comma. Both your input and the model’s output are counted in tokens.
Tokenization
How text is split into tokens. There’s no universal rule — every model splits the same sentence its own way.
Context window
The maximum number of tokens a model can hold at once. The “desk.” Fixed size.
Hidden tokens
The system prompt and any uploaded file take up the window too — before you type a word. A big file crowds out the conversation.
The misconception to drop
✕
“It’s ignoring me / tokens make it forget / it remembers me between chats.”
✓
The context window is a fixed token budget. When the conversation exceeds it, the oldest tokens are pushed out — that’s the forgetting. Each new chat starts with an empty window, and nothing carries over unless a memory feature saves it.
Put it to work
1
Keep what matters most near the end of the conversation, where it’s still in the window.
2
Start a fresh chat when you switch topics instead of dragging a long one along.
3
If something’s important, put it back — don’t assume it’s still there.
Ask the AI Tutor
Pause the video and ask anything from this lesson — the tutor answers from this lesson’s material.
What exactly is a token?
Why does AI forget the start of a long chat?
Do all models count tokens the same way?
Does it remember me in a new chat?
Next lesson
05 — Hallucination & Knowledge Limits