At Amplitude, we believe first and foremost in providing the best product analytics. We find the right solution for our users and then figure out how to make it happen on the engineering side. This is in contrast to other analytics services or in-house analytics teams that make compromises on data integrity because it’s easier from a technical perspective. But one of the top reasons that people don’t use analytics to make decisions is that they don’t trust the data. And for good reason — those of us building analytics have historically chosen to sacrifice accuracy when it makes systems easier to build. However, we believe that the role of analytics is changing, and that analytics can and needs to be better than that.
Amplitude has grown significantly both as a product and in data volume since our last blog post on the architecture, and we’ve had to rethink quite a few things since then (a good problem to have!). About six months ago, we realized that old Wave architecture was not going to be effective long-term, and started planning for the next iteration. As we continued to push the boundary of behavioral analytics, we gained more understanding of what we needed from a data storage and query perspective in order to continue advancing the product.
We had two main goals for the new system: (1) the ability to perform complex behavioral analyses (e.g. Compass and Pathfinder), and (2) cost-effective scalability. After extensive research, we decided to build an in-house column store that is designed specifically for behavioral analytics. We call the resulting system Nova, and we’re excited to share the thought process around how we got here and some of the key design decisions we made.
Update: In May 2016 we updated our analytics architecture to NOVA. Read the article here.
Laying the foundation with pre-aggregation and lambda architecture
Three weeks ago, we announced that we are giving away a compelling list of analytics features for free for up to 10 million events per month. That’s an order of magnitude more data than any comparable service, and we’re hoping it enables many more companies to start thinking about how they can leverage behavioral analytics to improve their products. How can we scale so efficiently? It comes down to understanding the nature of analytics queries and engineering the system for specific usage patterns. We’re excited to share what we learned while scaling to hundreds of billions of events and two of the key design choices of our system: pre-aggregation and lambda architecture. Continue reading