Why GPT, Claude, and Gemini Give Different Answers

Samsung Galaxy. Google Pixel. OnePlus. Three phones, all running Android, all built on the same Linux kernel. But pick up a Galaxy after using a Pixel — it doesn't feel like the same phone. Same camera sensor, completely different photos. Same notification system, completely different experience.

The hardware didn't make them different. The software layer each company built on top did.

The Shared Foundation

GPT, Claude, and Gemini are all built on the same core architecture: the transformer.

Published in 2017 by Google researchers in the paper "Attention Is All You Need," the transformer introduced a mechanism called self-attention — the ability to weigh every word in a sentence against every other word to understand context and meaning. That single idea became the foundation for virtually every major language model that followed.

OpenAI built GPT on it. Anthropic built Claude on it. Google built Gemini on it. The architectural DNA is shared. The same attention mechanism. The same fundamental approach to processing language.

So if they're all transformers — why does the same prompt produce three different answers?

Where They Diverge

The transformer is just the starting point. What makes each model different happens in the layers built on top: how it was trained, what data it learned from, and how it was taught to behave.

Think of it like this: three people graduate from the same university with the same degree. One joins a startup and learns to move fast. One joins a research lab and learns to be methodical. One joins a company with massive existing infrastructure and learns to leverage everything around them.

Same education. Completely different professionals.

That's what happened with GPT, Claude, and Gemini. The transformer was the shared education. The companies that trained them shaped who they became.

Three Different Training Philosophies

OpenAI and GPT — RLHF (Reinforcement Learning from Human Feedback)

OpenAI trained GPT using RLHF. The process: the model generates multiple responses to the same prompt, human raters rank those responses from best to worst, and the model learns from those rankings. Over thousands of iterations, it learns what humans consider a good answer.

OpenAI detailed this approach in their InstructGPT research paper. The model's behaviour is shaped by human judgement, one ranking at a time.

Anthropic and Claude — Constitutional AI

Anthropic took a different path. Instead of relying primarily on human raters, they wrote a set of principles — a "constitution" — and trained the model to critique its own responses against those principles. The model generates an answer, then asks itself: does this violate any of my principles? If yes, it revises before responding.

Anthropic published this as the Constitutional AI paper. The model argues with itself before giving you an answer.

Google and Gemini — Multimodal from Day One

Google built Gemini differently at an architectural level. While GPT and Claude were designed primarily for text and had image capabilities added later, Gemini was trained as natively multimodal from the start — processing text, images, audio, and video as a unified system, not bolted-on capabilities.

Google detailed this in their Gemini technical report. One notable architectural difference: Gemini uses a Mixture of Experts (MoE) approach, where different parts of the model specialise in different tasks and only the relevant "experts" activate for any given query — rather than the entire model activating every time.

Why This Matters

When you choose between GPT, Claude, and Gemini, you're not comparing three versions of the same product. You're comparing three philosophies of how an AI should behave.

RLHF means GPT's personality was shaped by what felt like a good answer to thousands of human raters. Constitutional AI means Claude's personality was shaped by internal self-critique against written principles. Multimodal-first means Gemini's strengths span text, vision, and beyond from the ground up.

The model you pick isn't a feature comparison. It's a philosophy comparison. Same transformer underneath. Same attention mechanism. But the training, the alignment, and the guardrails — that's where the personality lives.

That's why the same prompt gives you three different answers.

The Takeaway

GPT, Claude, and Gemini share the same architectural DNA — the transformer. What separates them is how each company trained, aligned, and shaped the model after that shared starting point. The architecture is the blueprint. The training is the upbringing.

If all three start from the same architecture — what exactly happens during that initial training phase that gives a model its raw capabilities, before alignment even begins?

Day 12: Pretraining.

Day 11 of 100 — AI Foundations | Change of Basis — Reframe the familAIr. See the invisible.

GPT vs Claude vs Gemini: Same Blueprint, Different Upbringing

The Shared Foundation

Where They Diverge

Three Different Training Philosophies

Why This Matters

The Takeaway

Comments

Change of Basis

Pretraining: The Phase That Determines Everything an AI Will Ever Know

More from this blog

Pretraining: The Phase That Determines Everything an AI Will Ever Know

Foundation Models: Why AI Stopped Building From Scratch

Next Token Prediction: How AI Builds Every Answer From Scratch

Transformers: The Architecture That Replaced Everything

Command Palette

The Shared Foundation

Where They Diverge

Three Different Training Philosophies

Why This Matters

The Takeaway

Comments

Change of Basis

Pretraining: The Phase That Determines Everything an AI Will Ever Know

More from this blog