AWS Cost Troubleshooting: How Missing Logs Caused Unexpected Charges

Cloud cost optimization articles often focus on reserved instances, storage classes, or rightsizing infrastructure.

This story is about something completely different.

A customer experienced a sudden and significant increase in AWS costs. There were no deployments, no infrastructure changes, and no noticeable increase in user traffic.

At first glance, nothing appeared wrong.

Yet costs continued to rise.

What followed became one of the most interesting investigations I’ve worked on involving CloudFront, S3, crawler behavior, logging, and media protection.

The First Alert

The first indication of a problem came from billing reports.

AWS costs had increased dramatically over a short period of time.

The environment itself was relatively straightforward:

Website delivered through Amazon CloudFront
Media content served through a dedicated CloudFront distribution
Videos and images stored in Amazon S3
Historical media archived using lower-cost storage classes

There were no infrastructure modifications.

No application releases.

No increase in legitimate user traffic.

Something else was consuming bandwidth.

The Investigation Begins

As with most cloud investigations, the first step was data collection.

Website logs were reviewed.

Traffic sources were analyzed.

User agents were examined.

Referrers were inspected.

Known crawlers and SEO tools were evaluated.

Thousands of requests were analyzed.

Nothing explained the increase.

The website traffic simply did not justify the bandwidth consumption being reported by AWS.

At this point, a concerning realization emerged:

The activity was not coming through the website.

It was hitting the media CDN directly.

The Problem Nobody Noticed

The media distribution had a critical observability gap.

At the time:

CloudFront access logging was not enabled
S3 access logging was not enabled
CloudTrail data events were not configured

We could see the website.

We could not see the media platform.

Essentially, we were trying to solve a network mystery while blindfolded.

The first incident remained unresolved because there simply wasn’t enough telemetry available to identify the source.

Preparing for the Next Incident

After the initial investigation, additional visibility was introduced.

CloudFront access logging was enabled.

Monitoring was improved.

Billing alerts were tightened.

The goal was simple:

If this happened again, we wanted evidence.

Fortunately—or unfortunately—it did.

The Same Problem Returns

Not long afterward, the same pattern reappeared.

Bandwidth consumption surged again.

This time, however, the logs were available.

Instead of spending days making assumptions, we could follow the data.

Within minutes, the answer became obvious.

A single automated crawler was responsible for the overwhelming majority of media downloads.

Not All Bots Are Created Equal

When most engineers think about web crawlers, they imagine search engines reading HTML pages and indexing content.

This crawler behaved differently.

Its objective was media analysis.

Its workflow looked something like this:

Visit Website
      ↓
Extract Media URLs
      ↓
Download Full Video Files
      ↓
Analyze Content
      ↓
Build Media Index

The crawler wasn’t reading metadata.

It was downloading the actual media files.

And it wasn’t downloading one file at a time.

It was downloading many files simultaneously from multiple IP addresses.

What looked like normal crawler activity at the website layer translated into massive bandwidth consumption at the media layer.

Why the Costs Escalated So Quickly

Media files are fundamentally different from HTML pages.

A crawler requesting a web page may consume a few kilobytes.

A crawler requesting a video may consume hundreds of megabytes or even gigabytes.

Now multiply that by:

Thousands of media objects
Multiple crawler instances
Parallel downloads
Repeated indexing activity

The result is substantial bandwidth usage in a very short period of time.

In this case, a single crawler generated the overwhelming majority of media traffic.

Actual users represented only a small fraction of total transfer.

The Hidden Cost Multiplier

The situation became even more interesting because some historical content was stored in archival storage classes optimized for cost efficiency.

Every retrieval generated additional charges beyond standard bandwidth costs.

What initially appeared to be a CloudFront issue was also creating downstream storage retrieval costs.

One request was producing charges across multiple AWS services simultaneously.

Evaluating Possible Solutions

Several mitigation strategies were considered.

robots.txt

The first idea was updating robots.txt rules.

This helps communicate crawler preferences.

However, robots.txt is not a security control.

It is merely a request.

Compliant crawlers may honor it.

Others may not.

AWS WAF

AWS WAF was evaluated next.

Advantages:

Immediate protection
Fast deployment
Effective against known traffic patterns

Disadvantages:

Requires ongoing maintenance
Depends on identifying crawler characteristics
Does not inherently protect media URLs

CloudFront Signed URLs

The most effective long-term solution turned out to be CloudFront Signed URLs.

Instead of exposing media objects directly:

https://cdn.example.com/video.mp4

the application generates time-limited signed URLs:

https://cdn.example.com/video.mp4?Expires=...&Signature=...&Key-Pair-Id=...

CloudFront validates the signature before serving the file.

Without a valid signature:

403 Access Denied

This approach shifts access control from the crawler to the platform itself.

The Unexpected Challenge

Like many production changes, implementation introduced its own lesson.

Some media files contained spaces and special characters in their filenames.

Initially, signed URLs appeared correct but CloudFront continued rejecting requests.

After detailed testing, the root cause was discovered.

Browsers automatically URL-encode certain characters.

CloudFront validates signatures against the encoded URL.

The application was signing one version of the URL while CloudFront was validating another.

A single encoded space character was enough to break signature validation.

The fix was straightforward:

rawurlencode($filename)

But finding it required testing, patience, and understanding exactly how CloudFront performs signature validation.

It was a reminder that the smallest implementation details often consume the most troubleshooting time.

The Final Architecture

The final solution combined multiple layers.

Layer 1: CloudFront Signed URLs

Only application-generated requests receive valid access tokens.

Direct media access is blocked.

Layer 2: AWS WAF

Known crawler traffic is filtered at the CloudFront edge before reaching the origin.

Layer 3: Improved Observability

Logging was enabled across critical services to ensure future investigations would start with evidence instead of assumptions.

The Most Valuable Lesson

The biggest takeaway from this project was not related to CloudFront, WAF, or even AWS costs.

It was visibility.

The first incident consumed significant investigation time because the necessary logs did not exist.

The second incident was resolved quickly because the right telemetry was available.

The technical solution was important.

The observability improvements were even more important.

Recommendations for Every AWS Environment

Based on this experience, I strongly recommend:

Enable CloudFront access logging for every distribution
Enable S3 access logging for critical buckets
Enable CloudTrail data events for sensitive object access
Configure billing alerts early
Monitor bandwidth anomalies proactively
Treat robots.txt as guidance, not protection
Protect expensive media assets using signed access mechanisms
Test thoroughly with real-world filenames and edge cases

Final Thoughts

Cloud environments are incredibly efficient when everything behaves as expected.

The challenge is that not everything behaves as expected.

Automated crawlers, indexing systems, bots, and third-party services constantly interact with public content in ways that are easy to overlook.

To them, media files are data.

To AWS, media files are bandwidth.

And to your monthly invoice, bandwidth has a cost.

The most effective optimization in this entire project wasn’t a new service or a complex architecture change.

It was visibility.

Because once you can see the problem clearly, solving it becomes much easier.