Estimate the duration of your experiments

The duration estimator can help you calculate the sample size and experiment run time needed to reach statistical significance in your Amplitude experiment, and to help you decide if an experiment would be worthwhile.

Note

While Amplitude Experiment supports sequential testing, the duration estimator solely supports determining the sample size for a T-test. Go to the Sequential Test / T-test comparison page to read more about the difference between sequential tests and T-tests.

Understand the duration estimator

This table describes the components involved in generating the duration estimate.

Component name and default setting	Definition and data validation	Relation to sample size needed for statistical significance
Confidence Level: 95%	The confidence level is a measure of your own tolerance for false positives in the results. For example, if you set the confidence level at 95%, that means that if you were to roll out the same experiment again and again, you would expect to get the same results 95% of the time. For the remaining 5%, you might interpret the results as statistically significant when they're not (in other words, a false positive). Your confidence interval must be between 1% and 99%. Amplitude recommends a minimum of 80%. The experiment's results may no longer be reliable for any level below that.	The larger the confidence level, the larger the sample size.
Control Mean: Automatically computed when you select the primary metric	The control mean is the average value of the selected primary metric over the last seven days (not including today) for users who completed the proxy exposure event. Consider adjusting the mean if there was a recent special event or holiday that may have affected the average in the last seven days. This value can't be zero, regardless of metric type. For conversion metrics, it can't be one. For conversion metrics, .5 means 50%, and not .5%.	The smaller the control mean, the larger the sample size.
Standard Deviation: Automatically computed for you when you select the primary metric	Standard deviation signifies the variance, or the spread, in the data (average between each data point and the mean). It only shows up for numerical metrics and not for binary or 0-1 conversion rates. The automatic calculation derives from the standard deviation of the primary metric over the last seven (7) days (not including today) for users that completed the proxy exposure event. This value can be any positive number.	The Larger the standard deviation, the larger the sample size.
Power: 80%	Power is the percentage of true positives. It can help measure the change's error rate. Think of power as the precision you need in your experiment, or what risk you're willing to take for potential erroneous results. This value must be between 1% and 99%. Don't set this lower than 70%.	The larger the power, the larger the sample size.
Test Type: 2-sided	A 1-sided t-test looks for either an increase or a decrease of the change compared to the mean, whereas a 2-sided t-test looks for both an increase and a decrease.	A 2-sided test requires a larger sample size than a 1-sided test.
Minimum Detectable Effect (MDE): 2%	The MDE is relative to the control mean of the primary metric. It's not absolute nor standardized. For example, if the conversion rate for control is 10%, an MDE of 2% means that a change is detectable if the rate moved outside of 9.8% to 10.2%. Use the smallest possible change. This value can be any positive percentage.	The smaller the MDE, the larger the sample size.

Interpret the duration estimator results

After you've entered all the components, the duration estimator displays a result which is the estimated number of days needed to reach statistical significance when conducting your experiment.

The duration estimator offers suggestions if your results are greater than the optimal 30 days. These suggestions include removing a variant or two as well as other optimizations. If results are within a reasonable period of time, the duration estimator tells you this.

Note

For Feature experiments, when your flag is inactive, Experiment uses the proxy exposure event (because of its historical traffic information) to estimate the duration of the experiment.

Reduce experiment run time

Sometimes, the results of the duration estimator point to a longer run time than you might want. Consider these options to decrease your experiment's run time:

Modify error rates to reduce the sample size needed.
Change the primary metric and exposure event.
Target more users.
Modify the standard deviation so that outliers don't carry as much weight.
Decide if the experiment is worth the run time in the first place.

The value of the duration estimator derives from the unique needs of your business goals and the risks that you're able to take to run them. Read more about the experiment design phase here.