Working with APIs
Connecting your product to AI models programmatically. Authentication, streaming, error handling.
Builder’s checkPut cost tracking in on day one, not the day your bill scares you. Log every call, tag it by feature, and look at it weekly. If you can't see the exposure, you can't manage it.
Every major LLM provider follows the same basic pattern: you send messages, you get a response, you pay per token.
The messages array
The core abstraction is the messages array with roles: system (your instructions), user (human input), and assistant (previous model responses).
Key parameters
| Parameter | What it controls | What to set it to |
|---|---|---|
| model | Which model handles the request | Start cheap, upgrade when you have evidence |
| temperature | Randomness. 0 = deterministic, 1 = creative | 0-0.3 for factual, 0.5-0.8 for creative |
| max_tokens | Response length cap | Set this. Prevent runaway responses. |
| stop | Sequences that end generation | Useful for structured output parsing |
Streaming: making slow feel fast
Without streaming, users stare at a spinner for 2-10 seconds. With streaming, they see tokens appear in ~200ms. For any user-facing feature, turn on streaming.
Error handling
| Error | What happened | What to do |
|---|---|---|
| Rate limit (429) | Too many requests | Exponential backoff: 1s, 2s, 4s |
| Context length | Input too long | Truncate or summarize history |
| Server error (5xx) | Provider issue | Retry with backoff, fall back to another provider |
| Timeout | Slow response | Set timeouts. Use streaming. |
Set spending limits today. Both OpenAI and Anthropic let you set hard caps on your account. A bug, a bot, or a single user running a thousand queries should not result in a surprise bill.