Estimate the duration of your experiments

This article helps you:

  • Understand the components of the duration estimator

  • Use the duration estimator to plan experiment sample size and run time needed to reach statistical significance

The duration estimator can help you calculate the sample size and experiment run time needed to reach statistical significance in your Amplitude experiment, and to help you decide if an experiment would be worthwhile.

Note

While Amplitude Experiment supports sequential testing, the duration estimator solely supports determining the sample size for a T-test. Click here to read more about the difference between sequential tests and T-tests. 

Understand the duration estimator

This table describes the components involved in generating the duration estimate.

Component name and default setting Definition and data validation Relation to sample size needed for statistical significance
Confidence Level: 95% The confidence level is a measure of your own tolerance for false positives in the results. For example, if you were to set the confidence level at 95%, that means that if you were to roll out the same experiment again and again, you would expect to get the same results 95% of the time. For the remaining 5%, you might interpret the results as statistically significant when they're not (in other words, a false positive). The confidence interval must be between 1% and 99%. Amplitude recommends a minimum of 80%. The experiment's results may no longer be reliable for any level below that. Larger the confidence level, larger the sample size
Control Mean: Automatically computed when you select the primary metric The control mean is the average value of the selected primary metric over the last seven days (not including today) for users who completed the proxy exposure event. Consider adjusting the mean if there was a recent special event or holiday that may have affected the average in the last seven days. This value can't be zero, regardless of metric type. For conversion metrics, it can't be one. For conversion metrics, .5 means 50%, and not .5%. Smaller the control mean, larger the sample size
Standard Deviation: Automatically computed for you when you select the primary metric Standard deviation signifies the variance, or the spread, in the data (average between each data point and the mean). It only shows up for numerical metrics and not for binary or 0-1 conversion rates. The automatic calculation derives from the standard deviation of the primary metric over the last seven days (not including today) for users that completed the proxy exposure event. This value can be any positive number. Larger the standard deviation, larger the sample size
Power: 80% Power is the percentage of true positives. It can help measure the change's error rate. Think of power as the precision you need in your experiment, or what risk you're willing to take for potential erroneous results. This value must be between 1% and 99%. Don't set this lower than 70%. Larger the power, larger the sample size
Test Type: 2-sided A 1-sided t-test looks for either an increase or a decrease of the change compared to the mean, whereas a 2-sided t-test looks for both an increase and a decrease. 2-sided test requires a larger sample size than a 1-sided test
Minimum Effect (MDE): 2% The MDE, aka the minimum goal or effect size, is relative to the control mean of the primary metric; it's not absolute nor standardized. For example, if the conversion rate for control is 10%, an MDE of 2% would mean that a change is detectable if the rate moved outside of 9.8% to 10.2%. Use the smallest possible change.  This value can be any positive percentage. Smaller the MDE, larger the sample size

Interpret the duration estimator results

After you've entered all the components, the duration estimator displays a result: the estimated number of days needed to reach statistical significance when conducting your experiment.

The duration estimator offers suggestions if your results are greater than the optimal 30 days, such as removing a variant or two. If results are within a reasonable time frame, the duration estimator tells you this.

Note

When your flag is inactive, Amplitude Experiment uses the proxy exposure event (because of its historical traffic information) to estimate the duration of the experiment.

Reduce experiment run time

Sometimes, the results of the duration estimator point to a longer run time than you might want. Consider these options to decrease your experiment's run time:

  • Modify error rates to reduce the sample size needed
  • Change the primary metric and exposure event
  • Target more users
  • Modify the standard deviation so that outliers don't carry as much weight
  • Decide if the experiment is worth the run time in the first place. If not, consider scrapping it.

Ultimately, the value of the duration estimator derives from the unique needs of your business goals and the risks that you're able to take to run them. Read more about the experiment design phase here.

Was this page helpful?

Thanks for your feedback!

July 11th, 2024

Need help? Contact Support

Visit Amplitude.com

Have a look at the Amplitude Blog

Learn more at Amplitude Academy

© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.