Token subsidies aren't going to last. I have receipts.
I burned more than $400 on AI inference in an afternoon last month — on direct API rates, while everyone else was on subsidized subscriptions. This week the industry caught up to where my bill already was.
I burned more than $400 in an afternoon on AI inference last month.
I wanted to try to stand up the start of a personal AI stack using an open source framework (pi-mono in this case). I had been using Claude Code on a subscription, so wasn’t ready when the same coding and writing I’d been cooking through my Claude Code sub on Opus 4.5 ripped through $200 Anthropic, $200 Google and almost $100 OpenAI in less than 3 hours.
Hence: cost metadata as a core primitive in the stack.
The past few days have seen the zeitgeist reporting what should have been obvious: the model labs aren’t going to keep subsidizing inference.
GitHub announced Copilot is moving to usage-based billing. Anthropic and OpenAI are publicly grappling with token economics — the thing where revenue per user has to climb without provoking churn: the era of subsidized AI model usage is ending.
That’s not a tweet I want to scroll past, because the day I burned more than $400 wasn’t an accident. It was a preview.
I wasn’t on a subsidized plan
Most people running AI tooling today are on a subscription — Copilot, ChatGPT Plus, Claude Pro, Cursor, take your pick. The number on the invoice is fixed, the inference budget is implicitly someone else’s problem, and the relationship between what you do and what it costs has no bearing on the actual cost of the inference unless you’re running with Haiku all the time.
I wasn’t on any of those tiers when I lit the fuse. I was running models directly through their APIs, paying per token. The work I was doing — long sessions, big context windows, an agent that’d happily call itself in a loop if I let it — racked up a bill the way a metered utility racks up a bill. I underestimated the curve.
The reason I’m telling this story now is that I’ve spent the weeks since rebuilding how I think about AI tooling, and the news this week confirms what the bill suggested: the gap between the price most users see and the price the inference actually costs is closing. The subscriptions hid the gap; the usage-based-pricing announcements are starting to expose it. If you’re using AI tools at all, the math you’re doing today probably isn’t the math you’ll be doing in six months.
I just got there earlier, the expensive way.
I built a budget tracker. That wasn’t enough.
The first thing I did after the bill was write a budget tracker — a class that records token usage per call, the tier of the model that produced it, and the dollar cost, and emits a summary event when the session ends. It works.
It also did almost nothing to change my behavior. The numbers were recorded. They weren’t visible.
There’s a difference between a ledger and a surface. A ledger is what your accountant checks at the end of the month. A surface is what changes how you act in the moment. I had a perfect ledger. I had no surface. So I’d start a session, work intently, and find out the cost only by querying the database afterward — which, of course, I rarely did.
This is the gap most cost dashboards have. They report. They don’t intervene.
Cost visibility as a Layer 0 stance
I run my AI stack out of a project I call DYFJ. (The acronym is intentional. RTFM has cousins.) It’s modular, vendor-loose, local-first by default. I keep a short list of foundational stances the project won’t compromise on, and one of them — promoted there after the $400 day — is this:
Cost visibility as a default, not an add-on. Token spend, model selection, and budget posture are surfaced before the work runs and tracked while it runs. Cost is a design concern, not a billing concern.
That’s the principle. The interesting part is what it produces when you take it seriously.
The four-piece surface
I ended up sketching a cost-visibility surface — the user-facing behavior that turns the ledger into a tool — with four pieces.
The pre-flight escalation banner. Whenever the system is about to call a paid-tier model, I see a single banner: which model, why it was selected, the estimated cost as a range (not a misleading point estimate), what I’ve spent in this session and today, and the prompt to allow it. No surprises. No retroactive math.
The per-turn tally. After every paid model response, one line: cost of this turn · cost of the session so far · percent of budget consumed. Not a dashboard. Just a single line. Spend stops being something I check; it becomes something I notice.
Soft warnings, not hard stops. When I cross 80% of session or daily budget, I get a warning that does not block. It tells me I’m at 82%, suggests how to extend if I want to, and lets me continue. The hard stop sits at 100%.
Overrun rescue. If a call would push me past the limit, I get one chance to extend the budget interactively before the call aborts. With an audit event written either way, so I can later see how often I extended and by how much.
None of this is technically clever. The cleverness, if there is any, is in the insistence — refusing to treat cost as a billing-tier-down concern, and insisting the surface be visible before the spend, during the spend, and across sessions.
I shipped the second piece tonight. One commit, fourteen lines of TypeScript. It’s running. The other three are tractable — a few hours of work each, each independently useful.
What changed today
I ran a prompt through my stack half an hour ago. The model was a local one — Tier 0, free — so the tally suppressed itself. No spend, no signal. That’s the design.
When I run a paid prompt next, the tally will print after each response, and the spend will go from invisible to obvious. No new dashboards. No new reports. Just a line of text that makes the cost a property of the work, not a property of the invoice.
That’s the whole game. Take a number that already exists, surface it where the decision happens, and the decision changes.
The news cycle is everyone arriving
The pattern is clear. The subsidies were holding. They’re not going to keep holding. The cost of inference is real, the providers can’t keep absorbing it indefinitely, and the bill is going to start showing up where users see it.
When that happens, every AI-using tool is going to need what I just spent the last month building. Not because anyone planned for this. Because the math is converging.
The principle
I’m not claiming I predicted anything. I paid the bill that’s coming for everyone, ran the numbers, and decided I didn’t want to live in a world where the cost of my own tooling was invisible to me until the next monthly statement.
What I’d argue, with some confidence after this week, is this:
Cost visibility isn’t an add-on, isn’t a billing concern, and isn’t going to be optional much longer. It’s a design property of any AI tool that’s going to be used over time, by anyone whose budget isn’t infinite — which is to say, anyone. The era of treating it as a feature you’ll get to later is ending. Designing for it now isn’t being prudent. It’s being on time.
If you’re building anything in this space, the question to ask isn’t “can I add cost tracking later?” It’s “what does my tool do, today, in the moment of decision, to make the cost visible to the person spending it?”