One of the major challenges that technology startups will face is scaling up effectively and efficiently. As your user base doubles or triples, how do you ensure that your services still run smoothly and deliver the same user experience? How do you maintain performance while being cost-efficient? Here at Amplitude, our customers have tracked more events in the past year than in the first 3 years of our company combined. As we and our customers grow, we need to continue providing the same if not better service across our platform. Previously, we explained how Nova, our distributed query engine, searches through billions of events and answers 95% of queries in less than 3 seconds. In this blog post, we will focus on our data processing pipeline that ingests and prepares event data for Nova, and explain how we stay cost-effective while our event volume multiplies.
In the world of “big data”, businesses that can quickly discover and act upon insights from their users’ events have a decisive advantage. It is no longer sufficient for analytics systems to solely rely on daily batch processing. This is why our new column store, Nova, continues to use a lambda architecture. In addition to a batch layer, this architecture also has a real-time layer that processes event data as they come in, and the real-time layer only needs to maintain the last day’s events. In a previous post, we focused on the batch layer of Nova. Designing the real-time layer to support incremental updates for a column store creates a different set of requirements and challenges. We will discuss our approach in this post.
Why & how we built a Slack app for Amplitude
If your team is anything like ours, you’re in Slack…a lot. At Amplitude, almost all internal communication happens in Slack, and it’s even our preferred method for talking to some of our customers.
Which is why when we were thinking about how to help teams share and discuss insights from user data, Slack was the first thing that popped into our minds. In fact, lots of our customers told us that they were taking screenshots of Amplitude graphs and pasting them into Slack for further discussion — not exactly an ideal workflow. Continue reading
If you’re ready to get started, read on!
Integrating Amplitude into your app can give you insights into how users are interacting with your app and what features are driving your retention. Various open source SDKs are available depending on your app, but first let’s go through some of the important customizations you’ll need to make.
If you already know all about how Amplitude tracks events, users, and sessions, then scroll down to the end to start the installation. If not, read on!
Amplitude has grown significantly both as a product and in data volume since our last blog post on the architecture, and we’ve had to rethink quite a few things since then (a good problem to have!). About six months ago, we realized that old Wave architecture was not going to be effective long-term, and started planning for the next iteration. As we continued to push the boundary of behavioral analytics, we gained more understanding of what we needed from a data storage and query perspective in order to continue advancing the product.
We had two main goals for the new system: (1) the ability to perform complex behavioral analyses (e.g. Compass and Pathfinder), and (2) cost-effective scalability. After extensive research, we decided to build an in-house column store that is designed specifically for behavioral analytics. We call the resulting system Nova, and we’re excited to share the thought process around how we got here and some of the key design decisions we made.