Playbook/Stage 04

Go Deeper

Level up the craft

Working with APIs

Connecting your product to AI models programmatically. Authentication, streaming, error handling.

Builder’s checkPut cost tracking in on day one, not the day your bill scares you. Log every call, tag it by feature, and look at it weekly. If you can't see the exposure, you can't manage it.

Every major LLM provider follows the same basic pattern: you send messages, you get a response, you pay per token.

The messages array

The core abstraction is the messages array with roles: system (your instructions), user (human input), and assistant (previous model responses).

Key parameters

ParameterWhat it controlsWhat to set it to
modelWhich model handles the requestStart cheap, upgrade when you have evidence
temperatureRandomness. 0 = deterministic, 1 = creative0-0.3 for factual, 0.5-0.8 for creative
max_tokensResponse length capSet this. Prevent runaway responses.
stopSequences that end generationUseful for structured output parsing

Streaming: making slow feel fast

Without streaming, users stare at a spinner for 2-10 seconds. With streaming, they see tokens appear in ~200ms. For any user-facing feature, turn on streaming.

Error handling

ErrorWhat happenedWhat to do
Rate limit (429)Too many requestsExponential backoff: 1s, 2s, 4s
Context lengthInput too longTruncate or summarize history
Server error (5xx)Provider issueRetry with backoff, fall back to another provider
TimeoutSlow responseSet timeouts. Use streaming.

Set spending limits today. Both OpenAI and Anthropic let you set hard caps on your account. A bug, a bot, or a single user running a thousand queries should not result in a surprise bill.