Go Deeper

Level up the craft

Understand the Models

What foundation models are, how they're priced, and how to pick the right one for your job.

What Go Deeper is about

Everything above this point is about getting a product live and getting people to use it. You can do all of that without understanding how the models work under the hood. This stage is for when your product demands more — when you need better outputs, lower costs, or capabilities your current setup can't deliver.

Understand the Models — what foundation models are, how they're priced, and how to pick the right one for your job.
Prompt Engineering — the craft of getting reliable, high-quality outputs. System prompts, few-shot examples, structured output.
Working with APIs — connecting your product to AI models programmatically. Authentication, streaming, error handling.
RAG & Knowledge — giving your AI access to your own data. Retrieval-augmented generation, embeddings, vector search.
Agents & Tools — letting your AI take actions, not just generate text. Tool use, chains, multi-agent patterns.

The most important thing here: do not get lost in this section before you have users. Technical depth feels like progress, but it's only useful in service of a product someone is paying for. Come here when you need to, not because it's interesting.

Builder’s checkYou do not need to understand everything in this section. You need to understand enough to make one decision: which model, at what cost, for which job. Run the math before you fall in love with the most capable option. A hundred users making ten calls a day at a penny a call is ten dollars a day, three hundred a month, before you've made a dime. Pick the cheapest model that does the job. You can upgrade when someone's paying you to.

What foundation models actually are

Foundation models are pretrained general-purpose AI systems that you adapt to your specific task through prompting, fine-tuning, or retrieval. You don't train them. You steer them. Think of them like an operating system: the layer everything else builds on top of.

Three things make a model a “foundation model”:

Scale. Trained on enormous amounts of data (trillions of tokens from the web, books, code, and more).
Generality. One model handles many different tasks without being retrained for each one.
Adaptability. You can steer it to your specific use case with prompts, examples, or fine-tuning.

The practical implication: you don't need to build a model. You need to learn how to use one effectively. That's what the rest of this guide is about.

What you're actually paying for: tokens

Models don't see text the way you do. They break everything into tokens, which are chunks of text, roughly 3-4 characters each. “Hello” is one token. “Tokenization” is two tokens. Code and non-English text tend to use more tokens per word.

This matters because you pay per token. Both input (what you send) and output (what the model generates) cost money.

What to know	Why it matters
Cost is per-token	Longer prompts and longer responses cost more. A system prompt you send with every request adds up fast.
Context window is in tokens	That “128K context window” is tokens, not characters. A 100-page document might be 50K tokens.
Non-English text costs more	“Hello” is 1 token, but the Japanese equivalent might be 3+ tokens.
Numbers are unpredictable	“1000” might be one token. “1001” might be two. This is why models are sometimes bad at math.

Choosing a model

This is not about picking the “best” model. It's about finding the right tradeoff between capability, cost, latency, and your specific use case.

If you need	Consider	Typical cost (per 1M tokens)
Best reasoning, complex tasks	Claude Opus, GPT-4o, Gemini 1.5 Pro	$10-30 input, $30-60 output
Good quality, reasonable cost	Claude Sonnet, GPT-4o-mini, Gemini Flash	$0.50-3 input, $1.50-10 output
Speed and low cost	Claude Haiku, Gemini Flash 8B	$0.03-0.25 input, $0.10-1 output
Full control, data privacy	Llama, Mistral, Qwen (self-hosted)	Infrastructure costs only

Start cheap, upgrade when you have evidence. Build your prototype with the cheapest model that works. Most builders overestimate how capable a model they need. If the cheap model fails, you'll know exactly where it fails, and that tells you exactly what you're paying extra for.

The cost math you should do right now

Before you pick a model, run this calculation:

How many users do you expect in the first 3 months? (Be honest, not optimistic.)
How many AI calls will each user make per day?
How many tokens per call? (A typical prompt + response is 1,000-3,000 tokens.)
Multiply: users × calls/day × tokens/call × 30 days × cost per token.

If the number scares you, use a cheaper model. If it's negligible, use whatever you want. The point is to know the number before it surprises you.

Context windows: how much the model can see

The context window is how much information you can feed the model in a single call. It has exploded from 4K tokens to over 1M tokens in just a few years.

At 4K tokens, you can fit a short conversation and a brief prompt.
At 128K tokens, you can fit an entire book or codebase.
At 1M tokens, you can fit almost anything.

But longer context is not free. More tokens in means higher cost and higher latency. Just because you can send a 100-page document doesn't mean you should if the answer is on page 3. This is where retrieval (RAG) comes in.

How models actually generate text

At each step, the model calculates a probability for every token in its vocabulary and then samples from that distribution to pick the next token.

Temperature controls randomness. At 0, the model always picks the most likely token. At 1, it samples from the full distribution. For most production use cases, 0 to 0.3 is the sweet spot.
Top-P (nucleus sampling) limits the pool of tokens the model can pick from. A top-p of 0.9 means “only consider tokens that make up the top 90% of probability.”
Max tokens caps the response length. Set this to prevent runaway responses that eat your budget.

The practical takeaway: Use low temperature (0-0.3) for factual, consistent tasks like data extraction or classification. Use moderate temperature (0.5-0.8) for creative tasks like writing or brainstorming. Set max tokens to something reasonable so a single runaway response doesn't blow your budget.

Go deeper: The Math Never Works — why AI product economics are harder than you think. LLM Cost Calculator— compare pricing across 18 models.

/cost-calc

AI Cost Calculator

Makes you do the unit economics math before you get surprised by a bill.

skill

---
description: Calculate your AI product's unit economics. Know your cost per user before you set your price.
---

You are a financial analyst who understands AI API pricing. The user is building an AI product and needs to understand their cost structure before they price it or scale it.

Walk through this calculation step by step. Do not skip steps.

**Step 1: Identify every AI call in your product.**
Ask: "List every feature in your product that calls an AI API. For each one, what does it do?"

**Step 2: Measure token usage per call.**
For each feature, estimate input tokens and output tokens.

**Step 3: Estimate usage patterns.**
Ask: "For a typical user, how many times per day/week/month would they use each feature?"

**Step 4: Calculate cost per user per month.**

**Step 5: Stress test at 100, 1,000, and 10,000 users.**

**Step 6: The 10x rule.** Price should be at least 10x AI cost per user.

Reference: https://builderspath.dev/playbook/#understand-the-models

Grow

Knowing If It Works

Prompt Engineering