Category Archives: Engineering

Dashboard Outage Post-Mortem

Amplitude Dashboard Outage: Post Mortem

On Monday, January 4, 2016, from 8:22 PM PST to 11:37 PM PST, we experienced an outage that prevented our customers from accessing their data on Amplitude. Following the outage, data on Amplitude remained stale until 3:23 PM PST on Monday, January 11, and several important features on Amplitude were inaccessible. We know many of our customers rely on Amplitude being available and up-to-date for their businesses, and we let you down. We’d like to take this opportunity to explain what happened, how we responded, and steps we are taking to prevent future outages like this from happening again.

Continue reading


Scaling Analytics at Amplitude

Laying the foundation with pre-aggregation and lambda architecture

Three weeks ago, we announced that we are giving away a compelling list of analytics features for free for up to 10 million events per month. That’s an order of magnitude more data than any comparable service, and we’re hoping it enables many more companies to start thinking about how they can leverage behavioral analytics to improve their products. How can we scale so efficiently? It comes down to understanding the nature of analytics queries and engineering the system for specific usage patterns. We’re excited to share what we learned while scaling to hundreds of billions of events and two of the key design choices of our system: pre-aggregation and lambda architecture. Continue reading


Optimizing Redshift Performance with Dynamic Schemas

Amazon Redshift has served us very well at Amplitude. Redshift is a cloud-based, managed data warehousing solution that we use to give our customers direct access to their raw data (you can read more about why we chose it over other Redshift alternatives in another post from a couple months ago).  This allows them to write SQL queries to answer ad hoc questions about user behavior in their apps.

But, as we scaled the number of customers and amount of data stored, issues began emerging in our original schema. Namely, sometimes our customer’s queries took a long time to complete, and we started getting some support tickets like this:

slow Redshift queries customer tickets

It was clearly time for an overhaul.

Continue reading

Example of a Redshift query written in SQL

Why We Chose Redshift

Vote on Hacker News

“So, what we’d really like is a way to get a list of users who came into our app through our last social media campaign, then invited at least 5 friends, and then used their discount code. Can we do that in Amplitude?”

Continue reading

Validating Big Data at Scale

Validating big data at scale

A couple months ago, our co-founder and CEO Spenser had the pleasure of giving a tech talk hosted by our good friends at KeepSafe. He went over some of the key data challenges that we face when we’re simultaneously collecting data from hundreds of millions of devices, including:

Continue reading


Optimal streaming histograms

Vote on Hacker News

How can you create a bucketing algorithm for an arbitrary dataset you don’t know in advance?

We get a constant stream of numerical data from our customers. We’re not sure what the range of this data might be or how many orders of magnitude it may span. The distribution of the data is constantly changing as new data is added.

Continue reading