Next Token Prediction: How AI Builds Every Answer From Scratch
Your phone's keyboard has been doing it for years. You type "see you", and it suggests "tomorrow." You type "on my", and it offers "way." One word, predicted from the words before it.
ChatGPT does the same thing. Just at a scale that makes the output look like thinking.
The Assumption That Broke First
When I started learning how LLMs generate responses, I had a clear picture in my head: somewhere inside the model, there's a database. You ask a question, it searches for the closest match, pulls the answer, and returns it.
That picture is completely wrong.
There is no database inside an LLM. No stored answers. No lookup table. No retrieval step. The model doesn't find your answer — it builds it, one word at a time.
Every single word in every response you've ever received from ChatGPT, Claude, or Gemini was predicted. Not retrieved. Predicted.
How the Prediction Actually Works
The process breaks down into a loop that repeats for every word the model generates:
Step 1: Convert words into vectors. Every word — yours and the ones the model has already generated — gets converted into a mathematical representation. A position in a high-dimensional space where similar meanings sit near each other. This is the embedding step from Day 04.
Step 2: Run attention. The transformer architecture from Day 08 takes over. Every word looks at every other word simultaneously, calculating which relationships matter. The model builds a rich understanding of the entire context so far.
Step 3: Predict the next token. Based on everything the model now understands about the sequence, it calculates a probability for every possible next word in its vocabulary. Thousands of candidates, each with a score. The model picks one.
Step 4: Repeat. That predicted word gets added to the sequence. The whole process runs again — embeddings, attention, prediction — now with one more word of context. And again. And again.
When ChatGPT writes you a five-paragraph response, it runs this loop hundreds of times. Each word is informed by every word before it. No planning ahead. No outline. Just: what's the most likely next word, given everything so far?
The Part That Surprised Me
What astonished me was the vector space. When I visualised it — words as points in space, similar meanings clustering together, the model navigating through that space to find the next prediction — it stopped feeling like magic and started feeling like mathematics.
The model isn't "thinking." It's moving through a landscape of meaning, where each step is a calculation: given where I am and everything I've seen, where should I step next?
Your phone keyboard does this in a tiny space — a few hundred common phrases. An LLM does it across billions of patterns learned from the entire internet. The mechanism is the same. The scale is what creates the illusion of intelligence.
Why Hallucinations Make Sense Now
Once you understand that every word is predicted — not retrieved — hallucinations stop being mysterious.
There's no database to be "correct" against. The model doesn't know facts the way a search engine does. It knows patterns. It knows what words typically follow other words in specific contexts.
When the input is clear and specific, the prediction stays on track. The patterns are strong. The model has seen similar sequences millions of times.
When the input is vague, those patterns weaken. The model still has to predict the next word — it can't say "I don't have enough signal." So it follows whatever pattern is strongest in the moment, even if that pattern leads somewhere wrong.
That's not a bug in the system. It's the system working exactly as designed — prediction without verification.
The Takeaway
AI doesn't retrieve answers. It generates them — one word at a time, billions of times. There is no database. Just a prediction. And when the input is vague, that prediction drifts — that's where hallucinations come from.
If next token prediction is all there is underneath — how do you go from "predict the next word" to a model that can code, translate, and hold a conversation?
Day 10: Foundation Models.
Day 09 of 100 — AI Foundations | Change of Basis — Reframe the familAIr. See the invisible.
