How ChatGPT Works: The LLM Workflow & Next-Word Prediction

Your phone keyboard has been doing this for years.

You type "Happy" and it suggests "birthday." You type "I'll be" and it suggests "there." You ignore it half the time, but it's always right enough to be useful.

That's next-word prediction. A model trained on language, guessing what comes next based on what came before.

ChatGPT runs the same play. Not metaphorically — the same fundamental mechanism, applied at a scale your phone keyboard couldn't dream of.

The difference between your keyboard suggestion and a conversation with an AI isn't magic. It's scale, architecture, and a workflow most people never see.

Here's the workflow.

Step One: The Model Doesn't Read Your Words

The first thing an LLM does when you send it a message is not read it.

It converts it.

Your text gets broken down into tokens — and tokens are not words. This is the first thing that surprises most people.

"Hello" might be one token. "Tokenisation" might be two — "token" and "isation." "ChatGPT" might be three. A space before a word is sometimes part of the token. Punctuation gets its own token. Common short words might share one.

The model never sees letters. It never sees words. It sees a sequence of numbers — each number representing a token from a fixed vocabulary of roughly 50,000 to 100,000 pieces.

When you assume each word is one token, you're in good company. Almost everyone does. The reality is subword pieces — chunks that let the model handle any word it encounters, even ones it's never seen before, by breaking them into familiar parts.

The Full Workflow

Once your text is tokens, here's what actually happens:

Tokens → Embeddings. Each token gets converted into a vector — a list of numbers that represents not just the token, but its meaning in relation to every other token. Similar concepts end up close together in this space. This is where meaning enters the system. Day 04 goes deep on this.

Embeddings → Neural Network. The vectors flow through layers of the model — billions of parameters, each one a weight learned during training. The network processes relationships between tokens, building a picture of what you said and what context you're in.

Neural Network → Probability. The model doesn't pick the next word. It calculates a probability distribution across its entire vocabulary — every possible next token, ranked by how likely it is given everything before it.

Probability → Next Token → Response. It picks from the top of that distribution. Then does it again. And again. Token by token, until the response is complete.

That's the entire mechanism. No understanding in the human sense. No knowledge being retrieved from a filing cabinet. Just: what token is most likely to come next, given everything that came before?

One More Thing: Context Windows

The model can't see everything at once.

Every LLM has a context window — a limit on how many tokens it can hold in view at one time. Within that window, the model sees everything. Beyond it, the model sees nothing — no memory of what came before the window started.

This is why very long conversations sometimes feel like the model forgot something you said earlier. It didn't forget. It ran out of window.

Stop Using AI Like a Magic Box

Most people send a message and wait for the oracle to respond.

Once you see the workflow — tokens, embeddings, probabilities, next-token prediction — the oracle disappears. What's left is a very powerful, very specific tool with a very specific way of operating.

You stop asking "why doesn't it just know this?" and start asking "what does the model actually have in its window right now?"

That shift — from passenger to operator — starts the moment you understand what the model is actually reading.

What Those Vectors Actually Mean

Tokens become numbers. Numbers become vectors. Vectors carry meaning.

But what does it actually mean for meaning to live in a vector? How does the model know that "king" and "queen" belong in the same neighbourhood? And how does that geometry power everything from search to recommendations to the model understanding your question?

That's Day 04.

Day 03 of 100 — AI Foundations | Change of Basis — Reframe the familAIr. See the invisible.

LLMs Don't Read Words. Here's What They Actually See.

Step One: The Model Doesn't Read Your Words

The Full Workflow

One More Thing: Context Windows

Stop Using AI Like a Magic Box

What Those Vectors Actually Mean

Comments

Change of Basis

You're Not Writing Prompts. You're Giving the AI Coordinates.

More from this blog

Foundation Models: Why AI Stopped Building From Scratch

Next Token Prediction: How AI Builds Every Answer From Scratch

Transformers: The Architecture That Replaced Everything

Attention: How AI Learned to Read Like You Do

Training Data: The Model Is What It Eats

Command Palette

Step One: The Model Doesn't Read Your Words

The Full Workflow

One More Thing: Context Windows

Stop Using AI Like a Magic Box

What Those Vectors Actually Mean

Comments

Change of Basis

You're Not Writing Prompts. You're Giving the AI Coordinates.

More from this blog