Amplitude's A/B testing features rely on standard statistical techniques to determine its statistical significance. This article covers some frequently asked questions about those calculations.
The information in this article only applies to A/B tests in a funnels chart. It does not apply to the Experiment Results chart, or to end-to-end experimentation in Amplitude Experiment.
Improvement over baseline is the ratio of the mean variant (A) over the mean baseline (B), . Mean of variant (A): Mean of baseline (B): = number of conversions = sample size
How does Amplitude calculate improvement over baseline?
Amplitude uses unique conversions instead of totals when looking for statistical significance. This is because looking at totals makes false assumptions about a user's behavior in the funnel - meaning, the aggregate sum assumes that each time the user enters the funnel is independent of the previous time they entered. This behavior cannot be assumed when calculating for statistical significance, however reviewing totals could be beneficial for other analyses in the experiment results chart or the Experiment end-to-end product.Why are unique conversions considered in the calculations but not totals?
Amplitude uses standardized statistical methods to calculate statistical significance. Keep in mind that the method used—either sequential testing or a two-tailed T-test—can vary depending on the feature you're using for analysis. By default, Amplitude Experiment and the Experiment Results chart use sequential testing, while the Funnel Analysis chart uses the two-tailed T-test. This means that if you're looking for similar results between analyses, the p-values may not match if your charts use different testing methods. If you want to use the T-test to analyze your end-to-end Experiment or Experiment Results chart data, follow the steps in this Help Center article. Interpreting stat sig results For both sequential testing and the T-test, Amplitude uses a false positive rate of 5% to judge results, and it only looks at the best-performing variant. By default, Amplitude uses a 5% false positive rate, the threshold for significance is (1- p value) > 95%. You can set a different false positive rate in Amplitude Experiment. You cannot change the false positive rate in the Funnel Analysis chart. To help reduce false positives, Amplitude sets a minimum sample size before it declares significance. Currently, this minimum is set to 30 samples, five conversions, and five non-conversions, for each variant. Tests that do not meet these minimums are automatically considered not statistically significant. When a test has reached statistical significance, you will see this green text: Otherwise, you will see the following red text:
How does Amplitude calculate statistical significance?
Note
Thanks for your feedback!
June 27th, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.