Scale: Manage your event volume with dynamic behavioral sampling

Scale helps manage costs related to very high event volumes.

With Scale, Amplitude enables dynamic behavioral sampling for ultra-high volume customers who have unique cost challenges. Sampling lets you keep your data costs manageable without compromising the accuracy of your analyses.

Scale is a paid add-on intended for extremely high volume customers. Amplitude does not sample by default. Contact your Account Manager if you believe Scale may be appropriate for your organization.

How sampling works

At the user level, the Amplitude algorithmic sampling framework samples events based on user identity:

For tracked users, Amplitude preserves users' full event streams and behaviors.
User-level sampling preserves data integrity better than random event-level sampling, which can provide incomplete data.

When you enable sampling, Amplitude upsamples metrics to provide highly accurate estimates on every chart and analysis.

Amplitude multiplies your events and users by a sampling factor equivalent to (100% / sampling rate).

For example, if you sample at 10%, Amplitude multiplies tracked events by 10 to estimate your true event volume. This helps Amplitude users focus on analytics without accounting for the sampling rate.

Each Amplitude chart shows the sampling rate applied to it.

You can view raw events for your project for last month and the current month, along with the number of events after sampling. This provides real-time access to event volume.

Sampling doesn't apply to PROPCOUNT results.

Set up sampling

You must be an Admin in your Amplitude organization to make sampling-related changes.

To set up sampling:

In the modal that opens, click Edit to set the dynamic sampling rate.

The dynamic sampling rate specifies how often Amplitude queries your data. For example, if you have 50 million active users per year and set a dynamic sampling rate of 10%, your queried data contains 5 million active users per year. Your event costs are significantly lower, with enough data to generate highly accurate analyses.

Set your user property inclusion list, if needed.

This list acts as a safelist for small, key sub-populations in your sampling process. Users in these populations are exempt from sampling and always appear in your data. The user properties and values you select define these populations.

Anonymous users

Although Amplitude prioritizes identifying and tracking unique users, ingestion-side sampling can cause inaccuracies for anonymous users. For example, if an anonymous user triggers an event from a new device, Amplitude assigns that user a new Amplitude ID and samples based on that new ID. If Amplitude later determines that this user was a previous user on a new device, Amplitude can't retroactively link the paired events to the user's previous Amplitude ID. Because event sampling uses the Amplitude ID at ingestion time, analyses that rely on user behavior on new devices may be inaccurate or skewed.

Accuracy benchmarks

Amplitude benchmarks sampled result accuracy in terms of percent error, or relative standard deviation at a 95%, two-tailed confidence interval. This is a function of standard error and the true unsampled result.

Customers with high volumes (10M DAUs and more) achieve results within 0.62% accuracy levels at a 5% sampling rate. Amplitude further assumes that any particular analysis only needs to consider 10% of the DAUs to achieve these results. Higher coverage generally results in higher accuracy.

The following table shows percent error at a 95% confidence interval across sampling rates for various DAU volumes:

DAUs	25%	10%	5%	2%	1%
500,000	1.73%	2.76%	3.91%	6.19%	8.76%
1,000,000	1.22%	1.95%	2.76%	4.38%	6.19%
5,000,000	0.55%	0.87%	1.24%	1.96%	2.77%
10,000,000	0.39%	0.62%	0.87%	1.38%	1.96%
20,000,000	0.27%	0.44%	0.62%	0.98%	1.39%
50,000,000	0.17%	0.28%	0.39%	0.62%	0.88%

For example, if you sample at 10% with 10 million users, it's extremely unlikely you will ever see more than 0.62% error in any metric. So, if your retention is 16%, you might see a variance of:

+/- 0.62% * 16% = +/- 0.1%

Was this helpful?