Go Deeper

Level up the craft

Working with APIs

Connecting your product to AI models programmatically. Authentication, streaming, error handling.

Builder’s checkPut cost tracking in on day one, not the day your bill scares you. Log every call, tag it by feature, and look at it weekly. If you can't see the exposure, you can't manage it.

Every major LLM provider follows the same basic pattern: you send messages, you get a response, you pay per token.

The messages array

The core abstraction is the messages array with roles: system (your instructions), user (human input), and assistant (previous model responses).

Key parameters

Parameter	What it controls	What to set it to
model	Which model handles the request	Start cheap, upgrade when you have evidence
temperature	Randomness. 0 = deterministic, 1 = creative	0-0.3 for factual, 0.5-0.8 for creative
max_tokens	Response length cap	Set this. Prevent runaway responses.
stop	Sequences that end generation	Useful for structured output parsing

Streaming: making slow feel fast

Without streaming, users stare at a spinner for 2-10 seconds. With streaming, they see tokens appear in ~200ms. For any user-facing feature, turn on streaming.

Error handling

Error	What happened	What to do
Rate limit (429)	Too many requests	Exponential backoff: 1s, 2s, 4s
Context length	Input too long	Truncate or summarize history
Server error (5xx)	Provider issue	Retry with backoff, fall back to another provider
Timeout	Slow response	Set timeouts. Use streaming.

Set spending limits today. Both OpenAI and Anthropic let you set hard caps on your account. A bug, a bot, or a single user running a thousand queries should not result in a surprise bill.

Prompt Engineering

RAG & Knowledge