Why your AWS bill suddenly spikes — 7 common causes
The seven shapes an unexpected AWS cost increase takes, what each one looks like in Cost Explorer, and how to track them down before they recur.
英語版を表示しています。翻訳は準備中です。
A 30-percent month-over-month AWS bill increase usually has a single cause hidden inside it. The challenge is that AWS gives you a dozen dimensions to slice the data — service, usage type, region, account, tag — and the spike rarely shows up cleanly on the first one you try. This post walks the seven causes that account for the majority of bill-spike incidents we have seen, the signature each one leaves in Cost Explorer, and the first dimension to slice on.
1. NAT Gateway data processing
NAT Gateways charge a flat hourly rate plus a per-GB data processing fee. The hourly rate is small and stable; the data-processing fee can silently 10× when a new workload starts pulling images, models, or backups from outside the VPC. A common pattern: a Kubernetes node pool scales up, every new pod pulls a 500 MB container image through the NAT Gateway, and the bill quietly grows.
Cost Explorer signature — EC2-Other usage type, with
NatGateway-Bytes in the breakdown. Service column says EC2, not
VPC.
Fix path — VPC endpoints (Gateway for S3 and DynamoDB, Interface for everything else) bypass the NAT entirely. Container registries (ECR) have a dedicated Interface endpoint that pays for itself within a few days if you pull images at any scale.
2. Cross-AZ data transfer
The price for moving a GB between availability zones in the same region is the same as moving it to a completely different region in many cases — $0.01 per GB each way. Engineers who design for multi-AZ resilience often miss that every request that crosses an AZ boundary is billable in both directions. A chatty microservice mesh without zone-aware routing can produce a four-figure monthly transfer bill from inside a single region.
Cost Explorer signature — EC2-Other or specific service charges
with DataTransfer-Regional-Bytes or InterZone-Out in the usage
type.
Fix path — turn on topology-aware routing in your service mesh (Istio, Linkerd, App Mesh) or set client-aware partitioning so reads hit the local-AZ replica. Confirm by graphing AZ-to-AZ bytes per day.
3. CloudWatch Logs ingestion
CloudWatch Logs charges per GB ingested, not per GB stored, and the ingestion price ($0.50 / GB) is more than ten times the storage price ($0.03 / GB). A debug-level log statement added under load can generate gigabytes per hour, and the cost ratchets up fast.
Cost Explorer signature — AmazonCloudWatch with
DataProcessing-Bytes or Storage-ByteHrs. Usage column shows the
log group name only if you turned on resource-level grouping.
Fix path — drop the debug log line, or route it through a sampling filter. For high-volume structured logs, Kinesis Firehose direct to S3 is often 90% cheaper for the same retention window.
4. EBS volumes detached from terminated instances
When you terminate an EC2 instance, the boot volume goes with it by default — but additional volumes you attached do not. Over time, detached volumes pile up. They are cheap individually ($0.08 / GB-month for gp3) but the count grows monotonically and nobody owns the cleanup.
Cost Explorer signature — EC2-Other,
EBS:VolumeUsage.gp3 (or gp2) with a steadily rising daily cost
that does not track instance hours.
Fix path — AWS Config rule ebs-snapshot-public-restorable-check
plus a Lambda that auto-deletes volumes detached for more than 30
days. Manually, sort the EBS volumes console by state = available
and check the most recent attachment.
5. CloudFront cache misses
CloudFront pricing has two components: bandwidth out to the user and HTTP requests. A cache hit serves from the edge, paying only the bandwidth. A cache miss serves from origin, paying bandwidth plus origin egress. A misconfigured cache (TTL of 60 seconds where it should be 24 hours, or a query-string-sensitive cache key that should be normalized) can produce a 10× cost difference without changing the end-user experience.
Cost Explorer signature — Amazon CloudFront cost rising in step with origin (S3, ALB, or EC2) DataTransfer-Out. The two numbers should not move together; if they do, the cache is barely working.
Fix path — check CloudFront's cache hit ratio in the distribution's report tab. Under 80% is a problem; under 50% means the cache is broken. Look at the cache key: query strings, headers, and cookies are common over-includes.
6. Cross-region API Gateway / Lambda misconfiguration
A Lambda invoked from a different region than its target resource pays the Lambda invocation cost plus regional data transfer. A common shape: a function deployed in us-east-1 calls a DynamoDB table in eu-west-1 because someone forgot to update the table ARN during a migration. Each invocation moves request and response bytes across the Atlantic.
Cost Explorer signature — DataTransfer-Out-Bytes appearing on both regions in roughly equal amounts, even when only one region is supposed to be active.
Fix path — region-tag your stacks and run periodic audits. Anything that talks across regions should be explicit and intentional (global replication, disaster recovery test), never accidental.
7. Trial-period or marketplace expiration
AWS Marketplace subscriptions and Reserved Instance discounts both expire on a calendar boundary, usually month-end. The day after, the on-demand rate kicks in for the underlying resource. The instance count and usage profile look identical to the previous month — only the rate changed.
Cost Explorer signature — same service, same instance type, same
hours, but the per-unit cost in Pricing jumps. Easier to see with
the cost-and-usage column showing both line items side by side.
Fix path — set Cost Explorer budgets that alert on per-unit cost deviation rather than just absolute spend. For RIs, the Trusted Advisor reservation expiration check catches this two weeks ahead.
Putting it together
For any specific spike, the diagnostic order that works most often: group by service first, then by usage type within the top contributor, then by linked account or tag. Most spikes resolve at the usage-type level. If they do not, the issue is usually #6 (cross-region) or #5 (cache configuration), which need looking at request-level data instead of cost data.
If you are doing this work regularly, our AWS Billing Analyzer takes a Cost Explorer CSV export and surfaces the seven patterns above directly, along with the worst offenders for each. It runs entirely in your browser — the CSV never leaves your machine, which matters because the file contains account IDs and resource ARNs you generally do not want to upload to a third-party site.