Cloud AI pricing has shifted. It is no longer just about model token rates. The real cost of running agents sits in memory, grounding, orchestration, tooling, and governance. If you are building AI into customer operations, RevOps, or service delivery, you need to understand where the spend actually lands.
Let’s break down the current landscape across AWS, Azure, and the tooling layer that sits around them.
Amazon Bedrock runs on a straightforward model in theory. You pay per input and output token for whichever foundation model you select. Rates vary by provider, region, and model tier. For example, models such as Claude or Titan have different input and output pricing, and output tokens are typically more expensive.
However, two areas materially affect total cost:
In practice, the difference between a well-designed agent and a poorly designed one can be multiples in monthly cost. If you are embedding AI into a CRM workflow or service desk, design discipline matters more than headline token rates.
Azure AI Foundry takes a slightly different approach. There is no separate “agent fee”. You are billed for:
• Model tokens
• Search services
• Storage
• Compute
• Any other Azure services invoked by the agent
This sounds simple, but it pushes architectural accountability back to you. If your agent calls search on every turn, spins up compute-heavy functions, or hits storage repeatedly, that becomes your bill.
There is no safety net. Good architecture controls cost. Poor architecture quietly compounds it.
Many teams focus on model pricing and forget the orchestration layer.
Tools like n8n price based on executions. If your AI workflow triggers multiple sub-flows per conversation, costs rise with usage. At small scale, this looks negligible. At thousands of executions per day, it becomes a material line item.
Observability and evaluation tooling such as LangSmith introduces either SaaS licence costs or enterprise contracts. This is not optional in serious deployments. If you cannot trace outputs, measure drift, or evaluate quality, you cannot scale safely.
The result is that your AI cost base becomes:
• Model tokens
• Memory operations and caching
• Search and retrieval
• Function execution
• Orchestration runs
• Observability and evaluation tooling
Model pricing is often less than half of total cost at scale.
If you are a CFO, you need to see a full architecture cost model before signing anything. Not just per-token pricing.
If you are a CIO, insist on:
• Hard quotas
• Clear caching strategy
• Defined grounding boundaries
• Monitoring on tool invocation frequency
If you are a CRO or CCO embedding AI into customer journeys, ask a harder question. Does this design reduce delivery cost or improve retention enough to justify its runtime cost?
Because the uncomfortable truth is this: a badly designed agent can look impressive in demo and erode margin in production.
The commercial pattern we see repeatedly is this:
This is avoidable. Treat prompts like code. Minimise redundant context. Use caching aggressively. Call tools only when needed. Log everything.
AI is not expensive by default. Undisciplined architecture is.
If you are building AI into CRM, CX, or RevOps, the competitive advantage will not come from who has access to a model. It will come from who controls cost while maintaining output quality.
That is an operational decision, not a technical one.