Optimizing AWS Backup Costs: Our Journey to Efficient Cloud Storage

Written in collaboration with Sabith Venkit, Solutions Engineering @ Amazon Web Services.

Anyone who manages a cloud environment knows that moment at the end of the month: you open the AWS bill and hope for a pleasant surprise. But all too often, one line item is quietly growing in the background, becoming one of the most significant expenses – AWS Backup, and more specifically, EBS backups.

I’m writing this from personal experience. Here at Wiv, without us even noticing, our backup costs became one of our heaviest expenses. Like many others, we thought we were covered. After all, we set up a Lifecycle Policy. What could go wrong?

Well, a lot.

The Trap: Why a Lifecycle Policy Isn’t a Magic Solution

Many organizations fall into the “set it and forget it” trap. They enable a backup retention policy (Lifecycle Policy)—for example, retaining backups for 45 days—and feel they’ve done their part. But this is where a critical misunderstanding lies. The question isn’t if you have a policy, but what that policy is doing to your bill.

The difference between retaining backups for 45 days versus 7 days can mean thousands of dollars per month. So, to answer the question directly: yes, changing your retention policy is one of the most effective ways to achieve significant savings. The problem is that it’s very difficult to measure the potential savings. Why? Because cloud backup pricing isn’t linear.

The Technical Challenge: The Magic of Incremental Backups (and Their Cost)

When AWS performs a backup of an EBS Volume, the first one is a full copy. But every subsequent backup is incremental—it only saves the “delta,” meaning only the blocks that have changed since the previous backup. This is a brilliant and efficient mechanism, but it creates a huge challenge in calculating costs.

When you save a backup from 40 days ago, you aren’t paying for the full size of the original resource. You’re only paying for the unique blocks that this backup “holds” for later backups. Measuring the true cost of this “delta” is nearly impossible with standard AWS tools. You can’t just look at the Cost Explorer and get a straight answer.

Optimizing AWS Backup Costs: Our Journey to Efficient Cloud Storage

The Solution: Two Automated Workflows for Policy Enforcement

The central problem is policy enforcement. The first step is to define a clear baseline policy. For example, establishing a rule like: “All EBS backups will be retained for 7 days only.” Such a policy provides a strong starting point, as it defines a clear target for optimization and allows you to measure any deviation from it.

But this is where the real challenges begin, especially at scale:

Identifying Deviations: How do you find all the resources that don’t comply with the policy?
Calculating Impact: What is the financial cost of each of these deviations?
Efficient Remediation: How do you fix the deviations effectively, without endless manual processes?

The solution isn’t one workflow, but two that complement each other: Discovery and Remediation.

Workflow #1: Discovery & Case Creation

The first automated engine is responsible for discovery, analysis, and preparing the information for action. Here’s how its algorithm works:

Discovery – What’s Actually Being Backed Up?
Instead of checking all existing volumes, we start from the end—the backups themselves. We retrieve a list of all EBS snapshots in the account and, from that, extract a unique list of all volumes that are actually being backed up. This way, we focus only on what’s relevant.
Analysis – How Much Data Really Changes Each Day?
Here, we enter a loop and go through each volume on the list. For each one, we fetch its snapshots, sort them from newest to oldest, and compare consecutive pairs. Using advanced API capabilities, we precisely measure the “delta”—how many blocks changed from one day to the next. By repeating this process for several past days, we calculate the average daily change for that volume.
Calculating Savings and Opening a Case:
Once we have the average daily change, we calculate the savings potential. The result doesn’t just stay in a static report. For every deviation from the desired policy, the system automatically opens a case in a case management system, identifies the Owner of the resource, and assigns the case to them. This way, we know exactly who to contact.

Workflow #2: Remediation with a Human-in-the-Loop

This is where the second engine comes into play, starting its work by reading the newly opened cases. It’s responsible for the remediation process while keeping the user in full control.

Sending for Approval: The workflow reads the case, identifies the assigned owner, and sends them an approval request with all the details: the deviation, its cost, and the recommended fix.
Taking Action: Based on the owner’s feedback (approval or rejection), the system can automatically perform the recommended fix, for example, updating the Lifecycle policy to 7 days.
Closing the Loop: After the action is taken, the workflow updates the case status in the management system (e.g., to “Resolved”), providing full transparency and a record of the remediation process.

From One-Time Analysis to Continuous Monitoring with Wiv.ai

This two-part workflow is powerful, but its true strength isn’t in a one-time run. Cloud environments are dynamic, and data volumes are constantly changing.

This is where the Wiv.ai platform comes in. We took this logic and turned it into an automated, continuous monitoring system. Instead of running scripts manually, our clients get a continuous view of their backup costs, with proactive optimization recommendations and automated management of the remediation process. This allows organizations to enforce policies, make data-driven decisions, and most importantly, stop burning money on unnecessary backups.

A true understanding of cloud costs is the first step toward effective FinOps. Analyzing backup costs is a prime example of where a small technical deep-dive can yield enormous savings.

The Trap: Why a Lifecycle Policy Isn’t a Magic Solution

The Technical Challenge: The Magic of Incremental Backups (and Their Cost)

The Solution: Two Automated Workflows for Policy Enforcement

Workflow #1: Discovery & Case Creation

Workflow #2: Remediation with a Human-in-the-Loop

From One-Time Analysis to Continuous Monitoring with Wiv.ai

The Illusion of Regional Disaster Recovery: Lessons from the AWS us-east-1 Incident

FinOps Automation: Why is Simple So Hard (And How We Solved It)

Don’t Fear the AI Bill: How to Apply Proven FinOps Playbooks to Generative AI Costs