Bayesian Statistics
Bayesian statistics compares groups by calculating the probability that one variant outperforms another. Unlike traditional methods that rely on p-values and fixed hypothesis testing, Bayesian statistics provides direct probability estimates that align with how teams make decisions.
How Bayesian analysis works
Bayesian methodology uses three core concepts to evaluate experiment results:
- Priors: your initial expectations about a parameter or hypothesis before you collect data. Amplitude Experiment uses uninformative priors to remain neutral about expected outcomes. For binary metrics, Amplitude uses a Beta(1,1) prior (a uniform distribution). For continuous metrics, Amplitude uses an uninformative normal prior.
- Likelihoods: the probability of observing the collected results given a particular set of parameter values. As users interact with your variants, the likelihood function incorporates the observed data into the analysis.
- Posteriors: the result of combining priors with observed likelihoods to produce updated beliefs about experiment outcomes. The posterior distribution shows the most likely values and the uncertainty around those estimates. The posterior forms the basis for all metrics in Amplitude Experiment, including relative lift, absolute lift, and variant means.
With Bayesian methodology, you can test whether the control mean differs from the treatment mean. Rather than setting an arbitrary significance threshold, you examine the entire distribution of possible outcomes and make decisions that fit your business goals and risk tolerance.
Bayesian statistics treats experimentation as an ongoing process of learning and updating expectations. As you gather more data, the posterior distribution refines to reflect your accumulated knowledge. This approach supports more responsive decisions based on market changes and customer preferences.
Key Bayesian concepts
Review the following concepts to understand Bayesian statistics:
- Chance to Beat Control: the probability that the treatment variant performs better than the control. This differs from the frequentist p-value, which only tells you the probability of seeing your results (or more extreme results) if no true effect exists. Bayesian analysis answers the more intuitive question: "What's the probability this variant is actually better?"
- Credible Interval: the range where the difference between treatment and control means lies with a specified probability. Unlike confidence intervals in frequentist statistics, Bayesian credible intervals directly describe the likely values of the parameter. A 95% credible interval means there's a 95% probability the true difference falls within that range.
- Chance to Be Best: appears when your experiment includes more than two variants. This metric shows the probability that each variant outperforms all other variants. With only two variants, this metric equals the Chance to Beat Control.
When to use Bayesian statistics
Bayesian methods work well when you want continuous insight into your experiment's performance. They're valuable when you need to incorporate prior knowledge, make decisions with smaller sample sizes, or require probability statements that directly answer business questions like "How likely is this variant to succeed?"
For teams running standard product experiments with straightforward decision criteria, Amplitude Experiment defaults to sequential testing, which offers fast results with strong false positive control. You can switch to Bayesian analysis at any point to gain additional perspective on your experiment outcomes.
Set up Bayesian statistics
To use Bayesian statistics for your experiment, complete the following steps:
- Go to your Experiments page.
- Open an existing experiment or click Create Experiment.
- Scroll to Advanced (Optional) settings, then click Stats Preferences.
- Click the Statistical Method dropdown and select Bayesian.
If your experiment is already running, switch to Bayesian analysis in the Experiment Results Chart. Click the gear icon, then Statistical Method, then select Bayesian.
Configure the Chance to Outperform Threshold
You can adjust the Chance to Outperform Threshold from 0% to 100%. Because Amplitude conducts a two-tailed Bayesian test, setting this threshold at 95% means your experiment reaches significance when the chance to outperform meets or exceeds 97.5%, or falls at or below 2.5%. This corresponds to a 95% credible interval.
How posterior-based metrics work
All metrics in Bayesian analysis (relative lift, absolute lift, and mean values) come from the posterior distribution. For example, with one exposure and zero conversions, the control mean displays as 33.3% because the prior adds one conversion and one non-conversion, producing a posterior mean of (0+1) / (1 + 1 + 1).
To view the mean without the prior distribution's influence, hover over the metric in the Analysis section. The mean over time chart shows the posterior mean in cumulative view and the raw mean of the data in non-cumulative view.
Minimum data requirements
To produce reliable results, Amplitude Experiment applies minimum thresholds before showing credible intervals and Chance to Beat Control:
- Binary metrics: at least 25 conversions and 100 exposures in each variant.
- Numeric metrics: at least 100 exposures in each variant.
False positive control and multiple comparisons
Bayesian statistics doesn't guarantee false positive control in the traditional sense. Amplitude disables Bonferroni correction when you use this method. Instead, Bayesian analysis controls expected loss, providing a more nuanced approach to decision-making under uncertainty. This approach follows established research on Bayesian multiple comparison methodology.
Custom priors
Click the 3 vertical dots icon for each metric. Then toggle on the "Customize the Prior". Set the parameters of the prior.
Limitations
Amplitude Experiment's Bayesian implementation doesn't support CUPED (variance reduction).
Was this helpful?