Reducing costs with AZ-awareness

You need data for product analytics, but data transfer costs can quickly add up as you grow. So, we developed a solution that can transfer high volumes of data faster – without breaking the bank.

Customer Stories

July 16, 2018

Leo Zhang

Senior Software Engineer

Data is integral to product analytics and lies at the core of what we do at Amplitude. To run smoothly, our data ingestion pipeline needs to match two criteria:

High availability
High scalability

Yet, achieving this is easier said than done. All of our services run on AWS and are deployed across multiple Availability Zones (AZs) to prevent single Availability Zone failure. We couple this multi-AZ approach with Kafka clusters and their respective consumers.

It’s a good system in principle but expensive in practice: AWS charges $0.01 per GB when we transfer data between AZs. While the cost seems trivial at first glance, it quickly adds up when you’re working with hundreds of terabytes of new data added every day. So quickly, in fact, that AZ data transfer grew to be our largest AWS expense. We needed a more efficient solution.

To guide our hunt for a better AZ data transfer solution, we dug deeper into how AWS works and discovered two important findings:

AWS determines cost based on the amount of the transferred data between two different AZs.
The amount of the transferred data is calculated by the average data size multiplied by the data volume.

This information identified our two solution priorities – data size and data volume – due to their primary role in driving AWS costs.

Directed by AWS’ method for determining cost, we were then able to successfully develop a data transfer solution that returned a significant cost reduction. The impact of our new approach was noticeable: for some, our solution dropped prices by 70%. For others, the data transfer cost becomes close to zero.

About the Author

Leo Zhang

Senior Software Engineer

Insights

Action

Data

Insights

Action

Data

Industry

Use Case

Team

Size