Dynamic Behavioral Sampling

Jin Hao Wan

Software Developer

people reacted
< 1 -minute Read,

Posted on August 13, 2018

Tracking every event is expensive. But, you still need to collect data for product analytics. The solution? Dynamic behavior sampling.

The cost of tracking every event in a sophisticated analytics tool can be prohibitively expensive. If that is not the case now, it will be as your company grows.

We are no strangers to this issue here at Amplitude. We often witness companies dealing with large volumes of data, some generating as many as 100 billion data points per month.

So how do you run large-scale analyses without throwing away your entire budget? The answer: behavioral sampling. At Amplitude, we are conscious of the prevalent need for behavioral sampling. We support ETL-level (extract, transform, load) sampling to reduce upfront cost and remove the need to regularly monitor data. Additionally, we implement a simple query-time sampling algorithm whose sole purpose is to deliver consistent statistical accuracy over time (v.s. reducing cost further).

While behavioral sampling is an ideal solution for collecting valuable data, it is important to be aware of a few caveats that can affect your findings if you’re not careful. Learn more about where to use behavioral sampling in your analytics pipeline, as well as how Amplitude solves this cost vs. quality trade-off with dynamic behavioral sampling.

Jin Hao Wan

Jin is on Amplitude's back-end engineering team, where he works on maintaining Amplitude's query engine and prototyping new features. He graduated from MIT with an MS in Computer Science.

More from Jin Hao

Best Practices