In a traditional A/B test, Amplitude Experiment assesses all the variants in your experiment until it reaches a statistically significant result. From there, you can choose to roll out the winning variant, or roll all users back to the control variant instead. Your decisions depend on why a particular variant outperformed the others.
But sometimes, that reason isn’t relevant. All you want is to decide which variant is performing the best and send as much traffic as possible to it. For example:
This feature is available to users on Enterprise plans who have purchased Amplitude Experiment. See the pricing page for more details.
Unlike in a traditional A/B test, multi-armed bandits don't use statistical significance as a determinant of success. Nor do they use a control or baseline variant. There are also differences in how Amplitude Experiment displays results for both experiment types; those are covered later in this article.
Multi-armed bandit experiments use Thompson sampling. No other statistical methodologies are available to use with a multi-armed bandit in Amplitude Experiment.
ampex_data_monster
were made by Amplitude.Building a multi-armed bandit experiment is almost identical to building an A/B test in Amplitude Experiment. There are some differences, though. The next section covers them in more detail
A multi-armed bandit experiment requires a primary metric. This is the metric Amplitude Experiment uses to optimize your experiment. You can include secondary metrics, but those are used for reporting purposes only.
In an A/B test, your primary metric can be a guardrail metric: in other words, one that you don’t want to be negatively affected by your experiment. Clickthrough rate is a good example of a guardrail metric. Because a multi-armed bandit experiment is about optimization, this would not make sense—you can’t optimize for a change that you don’t want to occur. For that reason, primary metrics for multi-armed bandit experiments must be success metrics ("will increase" or "will decrease"). Amplitude Experiment supports both binary metrics and continuous metrics.
If you want to optimize two metrics in your multi-armed bandit experiment, create a new custom metric that's a weighted average of both. Sometimes, you may face a tradeoff between metrics that you want to optimize, in which case, you should consider running an A/B test instead.
Allocation for a multi-armed bandit experiment always begins with a uniform distribution, as the model has no way to know which variant is the most effective before it has collected any data. This changes once that data begins coming in.
A multi-armed bandit adjusts the allocation between the variants only; it doesn't adjust the percentage rollout.
The confidence level in a multi-armed bandit experiment has a different role than in an A/B test. Here, it can accelerate traffic to the winning variant. For example, if your experiment’s confidence level is set to 95%, and the multi-armed bandit has already allocated at least 95% of the experiment’s traffic to the winning variant, Amplitude Experiment assumes confidence, and allocates 100% of traffic to that variant from that point on.
In multi-armed bandit experiments, the role of the minimum detectable effect (MDE) is to help calculate the duration estimate. Since these experiments are automated and about optimization, the MDE doesn't affect the experiment once it’s running.
When calculating the duration estimate prior to starting the experiment, Amplitude Experiment simulates what would happen if the same baseline mean (computed from historical data) is shared by all variants except one: when measuring an increase, this variant would have mean * (1+MDE)
; when measuring a decrease, it would be mean * (1-MDE)
. From there, Amplitude Experiment calculates how long it could take for the multi-armed bandit to assign all traffic to one variant. The duration estimate is capped at 31 days.
Amplitude Experiment doesn't display variant jumping while a multi-armed bandit is running. That’s because variant jumping is expected behavior in these experiments.
The data quality card isn't visible for multi-armed bandit experiments. Most of the checks conducted for this display don’t make sense for this experiment type. You can't make changes to the experiment that affect traffic allocation while the experiment is running.
The Bandits card is very similar to the non-cumulative exposure chart in the Monitor card, but is normalized to 100%, enabling you to easily visualize the percentage of traffic each variant is getting on that day.
When a multi-armed bandit allocates 70%, 80%, 90%, and 100% of an experiment’s traffic to a variant, Amplitude Experiment sends a notification to the editors of the experiment. If the bandit takes a long time to terminate or the end date is reached, then Amplitude Experiment also sends a notification. These can be through either Slack or email. For more information, see Integrate Slack with Amplitude.
To set up your notifications, navigate to Settings > Personal settings > Notifications.
Thanks for your feedback!
October 17th, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.