Understand the Models
What foundation models are, how they're priced, and how to pick the right one for your job.
What Go Deeper is about
Everything above this point is about getting a product live and getting people to use it. You can do all of that without understanding how the models work under the hood. This stage is for when your product demands more — when you need better outputs, lower costs, or capabilities your current setup can't deliver.
- Understand the Models — what foundation models are, how they're priced, and how to pick the right one for your job.
- Prompt Engineering — the craft of getting reliable, high-quality outputs. System prompts, few-shot examples, structured output.
- Working with APIs — connecting your product to AI models programmatically. Authentication, streaming, error handling.
- RAG & Knowledge — giving your AI access to your own data. Retrieval-augmented generation, embeddings, vector search.
- Agents & Tools — letting your AI take actions, not just generate text. Tool use, chains, multi-agent patterns.
The most important thing here: do not get lost in this section before you have users. Technical depth feels like progress, but it's only useful in service of a product someone is paying for. Come here when you need to, not because it's interesting.
What foundation models actually are
Foundation models are pretrained general-purpose AI systems that you adapt to your specific task through prompting, fine-tuning, or retrieval. You don't train them. You steer them. Think of them like an operating system: the layer everything else builds on top of.
Three things make a model a “foundation model”:
- Scale. Trained on enormous amounts of data (trillions of tokens from the web, books, code, and more).
- Generality. One model handles many different tasks without being retrained for each one.
- Adaptability. You can steer it to your specific use case with prompts, examples, or fine-tuning.
The practical implication: you don't need to build a model. You need to learn how to use one effectively. That's what the rest of this guide is about.
What you're actually paying for: tokens
Models don't see text the way you do. They break everything into tokens, which are chunks of text, roughly 3-4 characters each. “Hello” is one token. “Tokenization” is two tokens. Code and non-English text tend to use more tokens per word.
This matters because you pay per token. Both input (what you send) and output (what the model generates) cost money.
| What to know | Why it matters |
|---|---|
| Cost is per-token | Longer prompts and longer responses cost more. A system prompt you send with every request adds up fast. |
| Context window is in tokens | That “128K context window” is tokens, not characters. A 100-page document might be 50K tokens. |
| Non-English text costs more | “Hello” is 1 token, but the Japanese equivalent might be 3+ tokens. |
| Numbers are unpredictable | “1000” might be one token. “1001” might be two. This is why models are sometimes bad at math. |
Choosing a model
This is not about picking the “best” model. It's about finding the right tradeoff between capability, cost, latency, and your specific use case.
| If you need | Consider | Typical cost (per 1M tokens) |
|---|---|---|
| Best reasoning, complex tasks | Claude Opus, GPT-4o, Gemini 1.5 Pro | $10-30 input, $30-60 output |
| Good quality, reasonable cost | Claude Sonnet, GPT-4o-mini, Gemini Flash | $0.50-3 input, $1.50-10 output |
| Speed and low cost | Claude Haiku, Gemini Flash 8B | $0.03-0.25 input, $0.10-1 output |
| Full control, data privacy | Llama, Mistral, Qwen (self-hosted) | Infrastructure costs only |
Start cheap, upgrade when you have evidence. Build your prototype with the cheapest model that works. Most builders overestimate how capable a model they need. If the cheap model fails, you'll know exactly where it fails, and that tells you exactly what you're paying extra for.
The cost math you should do right now
Before you pick a model, run this calculation:
- How many users do you expect in the first 3 months? (Be honest, not optimistic.)
- How many AI calls will each user make per day?
- How many tokens per call? (A typical prompt + response is 1,000-3,000 tokens.)
- Multiply: users × calls/day × tokens/call × 30 days × cost per token.
If the number scares you, use a cheaper model. If it's negligible, use whatever you want. The point is to know the number before it surprises you.
Context windows: how much the model can see
The context window is how much information you can feed the model in a single call. It has exploded from 4K tokens to over 1M tokens in just a few years.
- At 4K tokens, you can fit a short conversation and a brief prompt.
- At 128K tokens, you can fit an entire book or codebase.
- At 1M tokens, you can fit almost anything.
But longer context is not free. More tokens in means higher cost and higher latency. Just because you can send a 100-page document doesn't mean you should if the answer is on page 3. This is where retrieval (RAG) comes in.
How models actually generate text
At each step, the model calculates a probability for every token in its vocabulary and then samples from that distribution to pick the next token.
- Temperature controls randomness. At 0, the model always picks the most likely token. At 1, it samples from the full distribution. For most production use cases, 0 to 0.3 is the sweet spot.
- Top-P (nucleus sampling) limits the pool of tokens the model can pick from. A top-p of 0.9 means “only consider tokens that make up the top 90% of probability.”
- Max tokens caps the response length. Set this to prevent runaway responses that eat your budget.
The practical takeaway: Use low temperature (0-0.3) for factual, consistent tasks like data extraction or classification. Use moderate temperature (0.5-0.8) for creative tasks like writing or brainstorming. Set max tokens to something reasonable so a single runaway response doesn't blow your budget.
AI Cost Calculator
Makes you do the unit economics math before you get surprised by a bill.
---
description: Calculate your AI product's unit economics. Know your cost per user before you set your price.
---
You are a financial analyst who understands AI API pricing. The user is building an AI product and needs to understand their cost structure before they price it or scale it.
Walk through this calculation step by step. Do not skip steps.
**Step 1: Identify every AI call in your product.**
Ask: "List every feature in your product that calls an AI API. For each one, what does it do?"
**Step 2: Measure token usage per call.**
For each feature, estimate input tokens and output tokens.
**Step 3: Estimate usage patterns.**
Ask: "For a typical user, how many times per day/week/month would they use each feature?"
**Step 4: Calculate cost per user per month.**
**Step 5: Stress test at 100, 1,000, and 10,000 users.**
**Step 6: The 10x rule.** Price should be at least 10x AI cost per user.
Reference: https://builderspath.dev/playbook/#understand-the-models