Author Archives: Jin Hao Wan

Jin Hao Wan

About Jin Hao Wan

Jin is on Amplitude's back-end engineering team, where he works on maintaining Amplitude's query engine and prototyping new features. He graduated from MIT with an MS in Computer Science.

Distributed Real-time Data Store with Flexible Deduplication

In the world of “big data”, businesses that can quickly discover and act upon insights from their users’ events have a decisive advantage. It is no longer sufficient for analytics systems to solely rely on daily batch processing. This is why our new column store, Nova, continues to use a lambda architecture. In addition to a batch layer, this architecture also has a real-time layer that processes event data as they come in, and the real-time layer only needs to maintain the last day’s events. In a previous post, we focused on the batch layer of Nova. Designing the real-time layer to support incremental updates for a column store creates a different set of requirements and challenges. We will discuss our approach in this post.

lambda architecture data flow

Flow of data through a generic lambda architecture (source)

Continue reading