Skip to main content

Command Palette

Search for a command to run...

Foundation Models: Why AI Stopped Building From Scratch

Updated
4 min read

In 2012, every app that needed a map — Uber, Ola, Swiggy, Zomato — used the same one. Google Maps.

Nobody built their own. Google had spent billions building that foundation — Street View cars driving down every street, satellite imagery, years of data collection and refinement. Everyone else just built on top of it.

That's exactly what's happening with AI right now.

What Came Before

Before foundation models, every AI task needed its own model built from scratch.

Spam detection? Collect spam emails, label them, train a model that only recognises spam. Sentiment analysis? Different dataset, different model, different training pipeline. Medical imaging? Start from zero again — new data, new architecture, new validation.

Each model was purpose-built. Narrow. Excellent at exactly one thing and completely useless at everything else. If you wanted AI to handle ten tasks, you built ten separate models. Ten datasets. Ten training runs. Ten deployments.

That worked — but it didn't scale. Every new task meant starting over. The cost and time multiplied with every problem you tried to solve.

The Shift

Foundation models flipped the entire approach.

Instead of training a narrow model for one specific task, the idea was radical: train a single massive model on broad data — text, code, conversations, books, websites, documentation — at enormous scale. Don't tell it what task to perform. Just let it learn language itself.

The result isn't a specialist. It's a generalist. A model that understands how language works — structure, meaning, context, pattern — without being trained for any single application.

That's the "foundation" part. And it's literal. You're not building a finished product. You're laying the base that many different things get built on.

GPT is a foundation model. OpenAI trained it on massive amounts of text data to understand and generate language. ChatGPT is an application built on that foundation — fine-tuned for conversation, wrapped in a user interface, given safety guardrails. But it's just one of many:

  • ChatGPT → conversational assistant

  • GitHub Copilot → code completion

  • Custom GPTs → domain-specific tools

  • API integrations → thousands of products you've never heard of

One model. Many applications. No one rebuilt the foundation for each one.

Who Can Actually Build One

This is where it gets real. Nobody builds a foundation model in their garage.

Training a model like GPT-4 reportedly required tens of thousands of GPUs running for months. The compute bill runs into hundreds of millions of dollars. You need massive curated datasets, specialised infrastructure, and teams of researchers who've spent years understanding how to make training stable at that scale.

That's why only a handful of organisations have done it:

  • OpenAI built GPT

  • Anthropic built Claude

  • Google built Gemini

  • Meta built LLaMA

  • Sarvam AI built Indus — India's sovereign foundation model, trained from scratch on Indian languages

Each one is a different foundation. Different training data, different architectural decisions, different strengths. But they all follow the same principle: train once at massive scale on broad data, then adapt for specific uses.

Everyone else — every startup, every enterprise, every developer building AI features — builds on top of these foundations. They fine-tune them for specific domains. They plug them into applications. They shape them for particular use cases. But the foundation itself? That took billions of dollars and years of research to lay.

Just like Google Maps. In the early days, every app that needed location used the same map. As they scaled, some — like Ola with Ola Maps — invested in building their own. But most still build on the foundations that already exist.

Why This Distinction Matters

Most people think ChatGPT is the model. It isn't.

ChatGPT is one application running on the GPT foundation. Understanding that distinction changes how you think about every AI product you encounter.

When someone says "we're using AI in our product," the real question becomes: which foundation? Are they building on GPT? Claude? Gemini? LLaMA? The foundation determines the raw capabilities. The application determines what you experience.

The Takeaway

A foundation model is AI trained once at massive scale on broad data — then adapted for many downstream tasks. GPT is the foundation. ChatGPT is just one house built on it. And there's more than one foundation.

If multiple foundations exist — and each has different strengths — how do you choose between them? What actually separates GPT from Claude from Gemini?

Day 11: GPT vs Claude vs Gemini.

Day 10 of 100 — AI Foundations | Change of Basis — Reframe the familAIr. See the invisible.