From usage to outcomes: how AI agents are changing what you pay for
Usage-based pricing has become the default expectation in B2B SaaS. If you shipped a product in the last five years and you’re still charging a flat per-seat fee regardless of what customers actually do with it, you’re probably leaving money on the table — or losing deals to competitors who are willing to align cost with consumption.
But something new is emerging on top of that baseline. Not as a replacement for usage-based pricing, but as an evolution of it. And it’s being driven almost entirely by the rise of AI agents.
Usage-based pricing, where it stands
The shift happened gradually and then all at once. Twilio built a business on charging per SMS and per call minute. AWS turned infrastructure into a utility bill. Snowflake charged by the compute credit. OpenAI made token-based pricing a household term (at least in certain households). Today, the majority of SaaS companies incorporate some form of consumption pricing alongside — or instead of — flat subscriptions.
The mechanics are well understood at this point. You pick a metric that reflects value delivered, you meter it, and you invoice against it. The interesting design decisions are mostly in how you aggregate.
Sum-based: you count everything. Total API calls in a month. Total SMS messages sent. Total records processed. Straightforward, predictable, and easy for customers to estimate. Twilio’s classic model.
Peak-based: you charge against the maximum. The highest number of concurrent users at any point in the month. Peak database connections. Maximum storage consumed. This works well for infrastructure where capacity is what you’re actually provisioning — the customer needs to know the ceiling will hold, and the peak is what that commitment costs.
Average-based: you take the mean across the period. Average daily active users over a month. Average hourly throughput. This smooths out spikes and tends to feel fairer to customers who have uneven usage patterns — a retailer that burns through your platform in Q4 but is quiet in Q1 appreciates not being invoiced at December’s peak for the whole year.
Most enterprise contracts layer usage on top of a baseline commitment. You agree to a minimum — a floor that gives the vendor revenue predictability — and anything above that is overage, billed at a per-unit rate. The pure pay-as-you-go model (no floor, no commitment) is more common in self-serve products where customers haven’t yet proven out their volume.
All of this is now relatively settled territory. The tooling exists. The billing infrastructure handles it. The sales motions are figured out. Usage-based pricing is no longer a differentiator — it’s table stakes.
What AI agents break
Here’s the problem that AI agents introduce: the thing you’re paying for stops being a useful proxy for value.
When you pay per API call, there’s a reasonable assumption that each call delivers some increment of value. When you pay per token with an LLM, you’re paying for compute — the raw material of what the model produces. These are usage-based metrics, and they have the same character as paying per GB of storage or per CPU-hour: you’re metering a resource, not an outcome.
But AI agents don’t just consume resources. They do work.
A support agent doesn’t just process tokens — it resolves customer issues. A sales development agent doesn’t just send emails — it books meetings. A code review agent doesn’t just generate text — it finds bugs and ships clean PRs. The unit of value isn’t the API call or the model invocation. It’s the thing that got done.
This is outcome-based pricing: you pay not for what the system consumed, but for what it accomplished. The bill arrives only when the promised result shows up.
This idea is older than AI
Outcome-based pricing sounds like an AI-native concept, but it isn’t. The clearest historical example is Rolls-Royce’s “Power by the Hour” model, introduced in the 1960s. Airlines don’t buy jet engines outright — they pay per hour of thrust delivered. The engine stays on Rolls-Royce’s books. Maintenance, reliability, uptime — all of it is Rolls-Royce’s problem, because their revenue depends on the engine actually running.
The commercial logic is elegant: the vendor can’t just ship hardware and walk away. Their income is tied to performance. Airlines get cost predictability tied directly to their own operations. And Rolls-Royce has a strong financial incentive to build engines that don’t break.
Healthcare is exploring the same structure. Some pharmaceutical contracts now tie payment to patient outcomes — reduced bills or rebates if the treatment doesn’t work. The drug company takes on part of the clinical risk rather than getting paid on delivery regardless of what happens next.
In both cases, the core dynamic is the same: shared risk, aligned incentives. The vendor only wins when the customer wins.
What this looks like in SaaS today
A few AI companies are already operating this way, and the model is clearer than you’d expect.
Intercom’s Fin AI agent charges $0.99 per resolved support ticket. Not per conversation opened, not per message sent, not per token used. If the agent can’t close the issue and escalates to a human, you don’t pay. If it resolves it, you pay $0.99. The pricing anchors directly to the value delivered — an automated resolution is worth far more than $0.99 compared to a human agent handling the same ticket, so the economics work for both sides.
An AI SDR product that charges per qualified meeting booked follows the same logic. Every email drafted, every follow-up sent, every response processed — none of that triggers a charge. The booking is the outcome. That’s what you pay for.
The common thread: the pricing unit is a result that a human can verify. The customer knows whether the ticket was resolved. They know whether the meeting happened. There’s no ambiguity about whether value was delivered, and no argument about whether the system was “working” even when nothing useful came out.
Why this is different from usage-based pricing
It’s worth being precise here because the distinction matters for how you build billing infrastructure.
Token usage is not outcome-based pricing. It’s usage-based pricing where the consumption metric happens to be LLM tokens. The same goes for charging per agent workflow execution, per “task” in the abstract, or per API call to your AI platform. If the meter is running regardless of what the system produces, that’s usage-based.
Outcome-based pricing requires that the billable event be defined in terms of a result — something that can be evaluated as achieved or not achieved. The system either booked the meeting or it didn’t. The ticket is either closed or it’s still open. The PR either merged or it didn’t.
This creates a genuinely different commercial relationship. With usage-based pricing, the vendor’s incentive is to drive consumption. With outcome-based pricing, the vendor’s incentive is to produce results — and the vendor absorbs the cost of failure. That’s a meaningful shift in how risk is distributed between buyer and seller.
How to set an outcome price
Pricing outcomes correctly requires a different starting point than pricing usage. You can’t just look at your infrastructure costs and add a margin — the per-unit cost of an LLM call varies too much depending on how many attempts the agent needs before reaching an outcome.
The better anchor is customer value. If an automated support resolution saves your customer $5 compared to a human agent handling the same ticket, charging $0.99 for it is an easy sell. If an AI SDR books a meeting that converts to a $20,000 deal, a $200 booking fee is still a compelling ROI.
The pricing question becomes: what is this outcome worth to the customer, and what fraction of that is a fair price? That’s a different conversation than “what does it cost us to run the model” — and it generally leads to higher prices, not lower ones, because the value delivered is much larger than the compute consumed.
Volume discounts, caps, and hybrid structures (a base platform fee plus per-outcome charges) all remain valid. But the anchor should be value, not cost.
Will it take off?
Probably, but more slowly and more selectively than the AI hype cycle would suggest.
Defining “done” is harder than it looks. “Resolved” sounds unambiguous until you write it into a contract. Did the ticket resolve because the agent handled it, or because the customer gave up? Does a booked meeting count if it cancels before it happens? Every outcome definition creates edge cases, and every edge case is a potential dispute. Getting this right requires explicit documentation of what counts, what doesn’t, and how disagreements get resolved — before the contract is signed.
Attribution is contested. Business results rarely have a single cause. If a customer signs after your AI SDR sent three emails and a human AE followed up with two calls, who gets credit for the meeting? The cleaner the outcome definition, the less this matters — but the more complex the outcome, the harder attribution becomes.
Volume volatility cuts both ways. Customers with seasonal patterns may generate billing spikes that feel surprising even when the outcomes are real. On the vendor side, underpricing outcomes is a real risk: if you’ve priced aggressively to win a deal and agent efficiency doesn’t improve as expected, the economics deteriorate fast.
Sales cycles get longer. Selling outcomes requires agreeing on definitions before the contract is signed — more stakeholders, more legal review, more time. It’s a more consultative motion than selling seats or API access.
For outcome-based pricing to work at scale, the outcome needs to be a system-of-record state that both parties can read without interpretation: a support ticket marked resolved, a calendar invite accepted, a PR with a merge commit. These don’t require judgment calls. They’re events in a log.
The harder cases — code quality, content effectiveness, sales pipeline influence — are where outcome pricing gets complicated fast. The outcome exists, but attributing it solely to the AI agent is genuinely hard, and any model that requires human arbitration on every transaction won’t scale.
What we’re likely to see is a mixed adoption: outcome-based pricing where the outcome is clean and verifiable, usage-based pricing everywhere else. The two models will coexist, often within the same product — outcome charges for the things that are measurable, usage charges for the ambient background processing that isn’t.
What this means for billing infrastructure
For SaaS companies building on top of AI, this is a design question that arrives earlier than you might expect. How you price isn’t just a go-to-market decision — it determines what you need to meter, how you store events, how you aggregate them, and how you communicate charges to customers.
Usage-based billing is solvable with standard metering and invoicing tooling. Outcome-based billing requires that same infrastructure, plus a layer of business logic that defines what constitutes a billable outcome and captures the event when it occurs.
Getting that infrastructure right from the start — before you’re trying to retrofit it onto a product with existing customers — is the kind of thing that compounds over time. The billing model you build for is the one you’ll be living with when the customer base is ten times larger.
The shift from usage to outcomes is still early. But the underlying logic is sound: if AI agents are going to take on real work, the pricing should reflect real results. The companies that figure out clean outcome definitions and build the infrastructure to support them will have a meaningful commercial advantage over those still charging per token.
Bunny handles both usage-based and outcome-based billing models, including complex aggregation rules and overage calculations. Talk to us if you’re working through how to price your AI product.