Course 1 · AI Foundations · Lesson 04

Tokens, Context & Why AI Forgets

A model reads and writes in chunks called tokens — and it can only hold a limited number at once. When that limit fills, the oldest fall away. That, not the tokens themselves, is why AI “forgets.”

Free · Course 1 ~5 min video Type · Misconception

The one mental model

Picture a fixed-size working desk. Everything on it, the model can see. As the conversation grows the desk fills, and the oldest notes slide off the back. The model isn’t ignoring you — that text is simply no longer on the desk.

Key terms

Token

A chunk of text — sometimes a whole word, sometimes a piece of one, sometimes just a comma. Both your input and the model’s output are counted in tokens.

Tokenization

How text is split into tokens. There’s no universal rule — every model splits the same sentence its own way.

Context window

The maximum number of tokens a model can hold at once. The “desk.” Fixed size.

Hidden tokens

The system prompt and any uploaded file take up the window too — before you type a word. A big file crowds out the conversation.

The misconception to drop

“It’s ignoring me / tokens make it forget / it remembers me between chats.”

✓

The context window is a fixed token budget. When the conversation exceeds it, the oldest tokens are pushed out — that’s the forgetting. Each new chat starts with an empty window, and nothing carries over unless a memory feature saves it.

Put it to work

Keep what matters most near the end of the conversation, where it’s still in the window.

Start a fresh chat when you switch topics instead of dragging a long one along.

If something’s important, put it back — don’t assume it’s still there.

Ask the AI Tutor

Pause the video and ask anything from this lesson — the tutor answers from this lesson’s material.

What exactly is a token? Why does AI forget the start of a long chat? Do all models count tokens the same way? Does it remember me in a new chat?

Next lesson

05 — Hallucination & Knowledge Limits

Continue →