← Writing

The Math Never Works

Every AI project estimate I've made or seen has been wrong the same way. The build takes longer, the integration is harder, the users are slower to adopt, and the costs you forgot to count are bigger than the ones you did.

I've estimated a lot of AI projects. For myself, for teams I worked with, for founders I've talked to. They all look roughly the same going in. A plausible plan. A reasonable timeline. Some version of: I'll build the AI feature, it'll save users X hours, I'll charge Y, and I'll be profitable in six months.

The math always works on the napkin. The math almost never works in production.

Not because the estimates are sloppy. Usually they're done by sharp people thinking carefully about what they know. The problem is what they don't know. And with AI, the things you don't know are bigger, more expensive, and more numerous than with any other kind of software you've built.

The four assumptions that always break

Every AI project estimate I've seen depends on at least four assumptions that feel like facts but are actually open questions. They look solid on paper. They dissolve on contact with reality.

1. "I know how long the build will take"

You probably do know how long the software part takes. The frontend, the API, the database, the auth flow. You've built those before. You can estimate them.

You almost certainly don't know how long the AI part takes. The prompt engineering that works in the playground but falls apart on real user inputs. The data cleaning you didn't know you'd need until you tried to use real data. The evaluation loop where you discover the model handles 80% of cases well and the remaining 20% require a completely different approach. The hallucination edge case that only surfaces when a user types something you never tested.

I talked to a founder who estimated four weeks for an AI feature. The software scaffolding took one week, just as planned. The prompt engineering and evaluation took eleven more. Not because it was hard in a way he could have anticipated. Because the gap between "works in testing" and "works reliably on messy real-world input" was larger than he'd ever experienced with deterministic software.

With regular software, the gap between your estimate and reality is usually 50-100%. With AI, it's often 3-5x. Not because you're bad at estimating. Because the system isn't deterministic and your experience estimating deterministic systems doesn't transfer.

2. "Integration will be straightforward"

You budgeted a week for integration. It will take three to six. Here's why.

The AI part of your product doesn't live in isolation. It connects to data sources, user workflows, third-party APIs, and existing systems. Every one of those connections has assumptions baked in. The data source has fields that are sometimes empty. The API rate-limits you in ways the documentation didn't mention. The user workflow has steps that make sense for human decision-making but don't map to automated outputs.

I watched this play out in my own work. A data pipeline I expected to wire up in a day took two weeks because the upstream system's schema had undocumented edge cases that only surfaced with production data. That wasn't a failure of planning. It was the nature of connecting systems that were designed by different people at different times for different purposes.

For solo builders, integration often means connecting to APIs you don't control. OpenAI changes their response format. Stripe webhooks behave differently in test vs. production. Your vector database handles 100 documents fine but crawls at 10,000. Each of these is solvable. None of them were in your estimate.

3. "Users will adopt it"

This is the assumption that breaks last and costs the most. You ship the feature. It works. Users don't use it. Or they use it once, don't trust the output, and go back to doing things manually.

Adoption isn't deployment. Deployment is technical. Adoption is behavioral. People have to change how they work, and people don't change how they work just because a better tool exists. They change when the new tool fits naturally into what they're already doing, when they trust it, and when the switching cost feels worth it.

I talked to a founder who built an AI writing assistant for sales teams. Beautiful product. Genuinely useful outputs. The sales reps used it for the first week, then stopped. Why? Because editing AI-generated emails took about as long as writing their own, and they trusted their own instincts more. The tool was good. The behavior change was too expensive for the perceived benefit.

His estimate assumed 80% adoption within a month. Actual adoption at month three was 15%, and most of that was one enthusiastic early adopter.

If your business model depends on users changing their behavior, your timeline is wrong. Not might be wrong. Is wrong. Behavior change is measured in months, not the week after launch.

4. "I know what this will cost to run"

You estimated your API costs based on your test usage. Your test usage is nothing like production usage.

In testing, you send clean, predictable inputs and get efficient responses. In production, users send long, messy, ambiguous inputs that consume more tokens. They retry when they don't like the answer. They use features you expected them to use once in ways that generate five API calls instead of one. Your cost-per-user estimate was based on an average that doesn't exist in the real world.

Beyond API costs, there are costs you probably haven't budgeted for at all:

Monitoring. The model works today. How do you know it works next month? Someone has to check. If that someone is you, it's your time. If it's a tool, it's a subscription.
Retraining and prompt iteration. The AI degrades as user behavior and data distributions shift. Maintaining quality is ongoing work, not a one-time investment.
Support. Users who encounter wrong AI outputs don't file a bug report. They send you an email that says "your product doesn't work." Diagnosing whether it's a model issue, a data issue, or a user-expectation issue takes time. More time than diagnosing a software bug, because the answer isn't in a stack trace.
The long tail of edge cases. The first 80% of use cases work fine. The remaining 20% each need individual attention. They don't stop arriving after launch. They accelerate, because more users means more ways to surface inputs you never tested.

What an honest estimate looks like

I'm not going to tell you to pad your estimates by 3x, even though that's usually closer to reality. Padding is just a fudge factor that makes you feel better without actually understanding the risk.

Instead, try this: for every AI project estimate, write down the four assumptions above and answer them honestly.

Build time: How much of this is deterministic software (estimable) vs. AI behavior (not estimable)? For the AI portion, have I built something like this before with production data? If not, double the estimate and accept it might still be wrong.
Integration: How many systems am I connecting to that I don't control? For each one, have I tested with real data at real scale? If not, add a week per system.
Adoption: Does this require users to change their behavior? If yes, what's my evidence that they will? "It's better" is not evidence. A conversation with five potential users where they described the pain and said they'd switch is evidence.
Running costs: What's my cost per user at 10x my test usage? What's my plan for when API pricing changes? What's the ongoing maintenance commitment?

And add one more thing that most estimates skip: kill criteria. Under what conditions would you stop? What would the data have to show? If you can't answer that before you start, you're not making an estimate. You're making a commitment, and commitments are hard to reverse even when the evidence says you should.

The estimate that gets you excited is the one that tells you what you want to hear. The estimate that saves you is the one that tells you what you need to hear. They're almost never the same spreadsheet.

Why I'm writing this

Because I've made every one of these mistakes. I've estimated four weeks and spent sixteen. I've assumed integration was a weekend and spent a month. I've built features that worked perfectly and watched nobody use them. I've been surprised by API bills and maintenance costs I didn't see coming.

Every builder I talk to makes these same mistakes, and it's not because they're careless. It's because AI feels like software, and our instinct is to estimate it like software. But the math from software doesn't transfer to AI, for the same reason the testing doesn't transfer and the shipping doesn't transfer: the system is probabilistic, the environment changes, and the users are unpredictable.

The napkin math is always clean. The production math never is. The builders who survive aren't the ones with the best estimates. They're the ones who build in the assumption that their estimates are wrong, and structure their projects so they can learn and adjust before the money runs out.

Related: Building AI Is Not Building Software — the deeper mental model shift behind why these estimates fail. And The Build Trap Got Cheaper — why AI's speed makes it easier to invest heavily before you've validated anything.

Builder's Path is a public lab from Sellhausen AI Systems focused on AI-native building, validation, and product judgment.

Built by Frank Sellhausen · Thinking · Privacy