Multi-Armed Bandits vs. A/B Testing: Choosing the Right Approach

Learn everything you need to know about multi-armed bandits (MABs), including their limitations and when to use them over A/B testing.

Perspectives
April 8, 2024
Image of Ken Kutyn
Ken Kutyn
Senior Solutions Consultant, Amplitude
Three multicolored graphs

In the digital space, the pace is relentless, and the stakes are high. Product teams have options when running experiments, including the well-mapped path of A/B testing and the swift, adaptive route of multi-armed bandits (MABs)—both solid choices depending on your use case and goals.

With MABs gaining interest for their promise of quicker insights and a streamlined path to peak performance, it’s tempting to follow the buzz. Product development leaders are increasingly considering MABs to stay agile in an ever-changing market.

Though there are scenarios where MABs make sense, it’s essential to understand MABs' purpose, limitations, and use cases so you know when to use them instead of traditional A/B testing.

Key takeaways
  • Multi-armed bandits quickly adapt to the more effective variant, making them ideal for time-sensitive scenarios.
  • MABs assume consistent user behavior over time, which can create challenges in accurately assessing variant performance.
  • A/B testing provides more comprehensive insights by enabling equal exploration of all variants—which is crucial for long-term strategic decisions.
  • The choice between MABs and A/B testing is often context-driven, balancing the need for rapid improvements with the depth of learning.

Multi-armed bandits vs. A/B testing

MABs are a testing strategy that helps companies determine the best option from a set of choices or variants.

MABs deliver the variants to viewers and start shifting to better-performing options. This concept illustrates the key difference between MABs and A/B testing.

A/B testing gives all variants equal opportunity throughout the test duration, ensuring a thorough exploration of each option’s potential. A/B tests keep serving the same proportion of variants, even if one is clearly outperforming the others.

MABs, on the other hand, initially explore different variants to gather data, but as soon as a high-performing variant emerges, they serve this option to more users. This approach enables quick adaptation to user responses.

Where MABs fall short

MABs work for some testing scenarios, but they are often misused and have limited applicability. Understanding their limitations will help you determine when to use them.

1. MABs don’t account for changes in user behavior

MABs assume that the conversion rate for each variant is consistent throughout the test and that users will continue taking the same actions they did at the beginning of the test.

However, suppose the initial data suggests one variant is superior, but user behavior shifts. In that case, MAB algorithms might overlook emerging patterns that favor a different variant, potentially resulting in missed opportunities to discover more effective strategies.

Let’s say you start on Monday, and variant B is performing best, but by Wednesday, C is the top performer. Running an A/B test would reveal that C is actually the best option, but if you use an MAB, you’ll likely never learn how great C is. And yes, while most MAB algorithms will continuously re-assess the performance of all variants, eventually directing more traffic to the C variant, we can’t make a statistics-backed decision on the best overall variant.

2. MABs prioritize a single metric

Most, if not all, commercial implementations of MABs focus on enhancing performance based on a single metric. Prioritizing one metric can be limiting as it overlooks a change’s broader impact.

For example, you might ask the algorithm to pick the variant with the most “add to carts.” But what if that variant also yields a spike in support tickets? Using an MAB, you don’t have the opportunity to tell the algorithm to “pick the variant that gets the most ‘add to carts’ unless it pushes support tickets above X%.” Thus, you’d be improving your “add to carts” metric to the detriment of your support ticket metric.

This singular focus can mask critical trade-offs or unintended consequences. With an A/B test, you can interpret results and make decisions that account for all experiment outcomes, including the performance of your secondary and monitoring metrics.

3. MABs don’t offer a consistent user experience by default

During testing, you usually want your users to keep seeing the same variant so they don’t have a confusing or inconsistent experience.

Inconsistent user experiences can be particularly challenging when users access your product across multiple devices or don’t log in. It can lead to confusion, diminish user trust, and negatively impact the overall perception of the product. A consistent interface offers a seamless user experience, maintaining the website or app’s sense of familiarity and reliability.

However, with MABs, you’re regularly changing traffic distribution—raising the question of what to do with returning users. Because MABs don’t offer consistent user experiences by default, teams must proactively manage this tradeoff to ensure the best user experience possible.

One of the most prominent approaches to managing this tradeoff is sticky bucketing. Sticky bucketing is an approach where once a user is assigned to an experiment variant, they consistently see the same variant each time they engage with your product.

With sticky bucketing, product managers get the best of both worlds: the ability to direct traffic to the highest-performing variant while also ensuring returning users get a consistent experience.

4. MABs don’t always provide the most conclusive results

Once the MAB reduces the traffic, it also reduces the data available to accurately assess the lower-performing variants. As such, you could potentially overlook situations where these variants might perform well—skewing your understanding of each variant’s effectiveness.

If you’re running an MAB to reach statistical significance, it’s essential to understand that the reduced traffic to your low-performing variants can compromise the certainty of your results. Essentially, you sacrifice certainty any time you move away from equal distribution.

Advantages and use cases for MABs

So when are MABs suitable? The sweet spot is a time-sensitive use case where your product or growth teams can’t separate or extend the learning and exploitation phases of the product management lifecycle loop.

One example is a promotion leading up to a holiday weekend. Say your growth team wants to run an A/B test for a promotion four weeks before the holiday—hoping to implement changes based on your test outcomes.

However, four weeks in advance may be too early because your customers might not be thinking about the upcoming holiday yet. Their actions and behaviors might differ if you run the test immediately before the holiday—but that won’t leave enough time to enact change.

In this case, there likely wouldn’t be a good way to do an A/B test. It would be impossible to have separate phases for learning and exploitation and insufficient time to implement the winner and benefit from the uplift.

In contrast, MABs can maximize conversions within this short window, adjusting in real time to capture the most effective strategy. This approach enables immediate insights into which variants resonate with customers during time-sensitive periods—balancing immediate results with learning opportunities. You can still learn from an MAB optimization, but it's not a replacement for a statistically rigorous AB test.

Choose the optimal approach for each digital testing scenario

In the dynamic realm of digital experimentation, success lies in choosing the right approach for the right situation. MABs are a great way to make fast, data-driven decisions in time-sensitive scenarios. Traditional A/B testing provides comprehensive insights essential for a long-term strategy and deeper understanding.

As we delve into the nuances of MABs, it’s equally important to understand the qualities, strengths, and limitations of A/B testing.

For a more in-depth look at A/B testing and its applications, explore Amplitude’s guide to product-led experimentation.

Follow me on LinkedIn for more product and analytics content.

About the Author
Image of Ken Kutyn
Ken Kutyn
Senior Solutions Consultant, Amplitude
Ken has 8 years experience in the analytics, experimentation, and personalization space. Originally from Vancouver Canada, he has lived and worked in London, Amsterdam, and San Francisco and is now based in Singapore. Ken has a passion for experimentation and data-driven decision-making and has spoken at several product development conferences. In his free time, he likes to travel around South East Asia with his family, bake bread, and explore the Singapore food scene.

More Perspectives