Sequential Testing Explained
Discover how sequential tests help quickly identify winning product variations. Learn about their design, implementation, and best practices from actual use cases.
What is sequential testing?
Sequential testing enables you to evaluate the data as it is collected rather than waiting until the end of the test to analyze the results. This process differs from traditional , where a fixed sample size is set beforehand, and you conduct analysis only after this size is reached.
With sequential tests, you define your statistical boundaries at the start. Examples of these limits might include:
- Error: Limits on how often the test can make mistakes (such as false positives and negatives)
- Stopping boundaries: Points where the test stops if the results are very obvious
- Decision points: Specific times to check the data and decide whether to continue or stop
- Sample size boundaries: The minimum and maximum number of samples to collect
- Futility boundaries: When to stop early if it’s clear the test won’t show significant results
As each data point comes in, the system analyzes it against these boundaries. The test automatically continues or stops based on whether the data crosses them.
The test can stop early if the results pass the boundary for or futility. However, if the results are inconclusive, the test keeps running until it crosses the boundary or you reach the maximum sample size.
In the real world, sequential testing might look like the following:
- An ecommerce company wants to test a new website layout
- It establishes statistical boundaries for before the experiment starts
- As visitors interact with the new layout, their behavior is immediately analyzed
- If conversion rates significantly improve or decline, crossing the predefined boundaries, the test automatically stops
- If not, the test continues until the company can decide how effective the new layout is
Sequential tests help you make confident conclusions quickly without gathering unnecessary data. By continuously evaluating the results, you can focus more on successful tests and abandon unsuccessful ones sooner. This option reduces wasted efforts and saves time and money.
The method provides a middle ground between going with your best guess and running a never-ending large experiment. Being able to “stop when you’re confident” can lead to major resource savings compared to a traditional fixed-length test.
How does sequential testing differ from traditional testing?
The main difference between sequential testing and traditional fixed horizon tests is how and when you perform the statistical analysis on incoming data.
With typical fixed tests:
- You calculate a single upfront based on the desired statistical power (the likelihood an experiment will show a difference when there is one)
- The system presents different variations to users until it reaches the total sample size
- happens in one batch at the end of the testing period
Fixed test durations mean you often run tests longer than needed, especially if the results are incredibly positive or negative early on. It can be an inefficient use of and time.
Sequential tests, on the other hand, take an iterative approach:
- You set minimum and maximum bounds instead of a defined sample size
- Statistical analysis happens as the data comes in
- Tests can stop early once they cross these boundaries
Rather than compromising statistical power by cutting tests short at random, sequential analysis ensures reliable results by following the same thoroughness as fixed tests. The difference is taking advantage of the early stops when those results are clear.
Additionally, while traditional tests typically use a fixed 95% confidence threshold, sequential tests adjust their boundary regions as they go on. A test might start with wide boundaries and narrow this limit as more data comes in. This approach prevents you from stopping the test too early due to noise in the data.
Sequential testing vs. Bayesian statistics
Like sequential testing, continuously update your conclusions as data rolls in. However, there are some distinctions between the two.
The frequentist statistics framework and the (NHST) paradigm often build on sequential testing. This means they set clear rules for deciding whether the data shows a definite result or not, helping to determine whether the result is statistically significant.
don’t use these hard cutoffs. Instead, they apply to calculate the chances of different effect sizes being true based on observed data. You then update your beliefs as new data comes in and base conclusions on these changing probability distributions rather than crossing a threshold.
An advantage of Bayesian techniques is that they can include other empirical information or personal beliefs in the analysis. Additional data might come from user surveys, market research, past performance metrics, and technical expertise. Sequential tests are more limited to using only the data from that specific test.
However, sequential tests offer practical benefits that appeal to some analysts as they:
- Are easy to interpret using traditional statistical terms like p-values and confidence intervals
- Offer precise control over false positives and false negative error rates
- Provide definitive stopping rules for making clear test decisions
Bayesian techniques can sometimes get quite complex and may not provide obvious “stop” or “don’t stop” boundaries.
Sequential testing vs. parallel testing
Parallel testing involves running multiple experiment variations at the same time. The analysis happens in batches across the variations until enough data is collected to identify potential winners. They’re suitable for when you need to evaluate several different ideas together.
Sequential tests only run one variation at a time. The experiment ends when that variation crosses a boundary, and you can then test the next variation. This method offers increased sensitivity and quicker learning by isolating one version—the positive “signals” you’re looking for aren’t diluted across multiple trials. It focuses the test’s power to find real effects more efficiently.
However, parallel tests can cover more ground faster. They let you concurrently check all your ideas and find the best ones. You can then use sequential testing to explore the strongest variations in more detail. This combined approach ensures you don’t waste time testing less valuable concepts and overlook more promising ones.
Benefits of sequential testing
Sequential testing methods offer several advantages for teams focused on making the most of their product through data-driven experimentation.
Spot strong ideas earlier
Sequential testing enables you to stop testing when one variation proves better or worse than the baseline. For instance, if your tweaks to the checkout page drive more sales, you can roll out the new version and make it visible to all visitors. Experimenting in this way means you can quickly implement successful changes and redirect resources from less effective ideas.
Use data more efficiently
Unlike traditional tests, which continue gathering data unnecessarily, sequential testing stops data collection once there’s enough evidence for a conclusion. This approach helps conserve resources by focusing only on statistically significant results, enabling you to deliver the best possible .
Reduce opportunity costs
Each day an underperforming product stays live, you potentially lose out on conversions, , or revenue. Sequential tests make it possible to stop sending traffic to losing variations—they quickly mitigate opportunity costs by stopping failed tests in their tracks.
Real-world applications
Several industries use sequential experimentation techniques, which are ideal for businesses that rely on quick, data-driven decision-making to optimize user experiences, processes, and business outcomes.
Ecommerce websites
Online retailers often use sequential A/B tests to refine shopping experiences like checkout flows, product browsing, search functionality, etc. These tests are aimed at maximizing conversion rates and revenue. Major players apply sequential methods to efficiently validate their ideas and improve their ecommerce sites and apps.
Consumer product optimization
Technology companies use sequential testing to optimize their consumer product experiences. This includes making UI/ refinements, creating more relevant news and feeds, and evaluating new launches.
Marketing campaigns
Digital marketing teams commonly run sequential tests when optimizing and personalizing ad campaigns, landing pages, email flows, and push notifications. These methods help them quickly spot high-performing creative and targeting tactics, improving the effectiveness of their marketing funnels.
Online service experiences
For web services like travel booking, , and investment platforms, delivering the best user experiences is paramount for conversions, engagement, and brand perception. Operations teams use sequential tests to validate proposed experience enhancements and product iterations quickly and quickly get positive changes out the door.
Healthcare applications
Within IT, professionals use sequential methods to improve clinical workflows in electronic medical record (EMR) systems. They also help enhance patient-facing portals and apps based on evolving needs and from providers and end users.
How to carry out sequential tests
Conducting a sequential test requires upfront planning and monitoring compared to more straightforward fixed tests. Effectively adopting sequential testing also means following some best practices to help overcome any learning hurdles:
- Start with higher-impact or simpler isolation tests
- Monitor in set intervals to align with your team’s resources
- Use reliable monitoring tools
- Establish clear protocols on how to interpret the results
- Integrate with your current deployment processes
- Manage potential stakeholder pushback
- Analyze and document your learnings to create a knowledge base
- Upskill your team
With these tips in mind, you can start creating and running your own sequential test. Here are the key steps to follow.
1. Define your hypothesis and metrics
As with any experimentation, you should clearly define the null and alternative hypotheses tested.
Let’s say an site wanted to optimize its checkout flow. The null hypothesis could state that there will be no difference in conversion rate between the current experience (control) and a new, streamlined page (variation). The alternative hypothesis would be that the checkout variation will result in a higher checkout conversion rate.
Your hypotheses help you identify the core user experience metrics you will use to evaluate these different product variations. As well as conversion rates, these might be:
- Revenue per visitor
- Clicks or taps per session
- Retention rates
- and depth
- Page views
- Net promoter scores
- Customer ratings and reviews
You should avoid sequential tests on page load times or other technical indicators because code optimizations solve those better.
Instead, it’s best to highlight 1-3 key metrics that capture the essence of what you’re trying to improve through experimentation on the user experience. These will become your north star guides to determine if the variations are successful when tracked sequentially.
2. Set statistical boundaries
Setting statistical boundaries is one of the most essential parts of sequential test design. The statistical boundaries outline the thresholds for making decisions on efficacy (the variation is better) and futility (there is no difference).
Common methods for setting these include:
- Spending functions like to set boundaries that control error rates during the test
- Bayesian calculations, using prior data and likelihood ratios to shape boundaries
- Simulations to determine boundaries that ensure the desired power and sample size
Boundaries typically start relatively wide to enable early fluctuations before narrowing over time. Doing so prevents you from prematurely stopping the test because of early data noise.
3. Specify maximum and minimum sample sizes
Although your sample sizes can adjust and adapt as the test continues, you must set a minimum and maximum limit.
The minimum sample size is the smallest amount of data needed before you can start adequately monitoring and stops you from acting on wildly variable results.
The maximum sample size is the longest the test can run while still inconclusive. It acts as a safeguard, preventing tests from continuing indefinitely. Practical constraints like time and available resources usually inform it.
4. Establish monitoring intervals
Sequential tests require you to define how frequently you’ll evaluate the data against the test’s boundaries.
For example, you could look at the data more frequently, such as at every 100 new samples or even hourly. Regular reviews support faster decision-making but involve more processing. Monitoring the data daily or in weekly batches reduces number crunching but means you'll reach conclusions more slowly.
The most common intervals are daily or at every 10% or 20% of the total sample size. However, more frequent monitoring can be better if your resources allow it so you can get timely insights and make proactive adjustments.
5. Run the test
With your boundaries, sample caps, and monitoring intervals decided, the test can begin.
Start by exposing both the control and the variation to your traffic flows. On each monitoring interval, update the metrics and plot them against your boundaries.
You can deploy the changes once the test crosses its efficacy bounds. However, if the futility bounds are triggered, it’s a sign you should stick to the original product version. If the results are inconclusive, the test continues until you reach your maximum sample size.
Analyzing sequential test results
Once a sequential test reaches its stopping point, the next step is correctly interpreting the outcomes and calculating the relevant metrics. This analysis helps guide your decision-making and learning.
Interpret the primary metrics
While your chosen core metrics, such as conversion or engagement rates, will determine the statistical outcome, you should review the entire time series charts and data distributions.
- For efficacy: Did the variation show a stable improvement quickly, or was the final gain smaller than expected? This difference affects when to deploy and scale your product changes.
- For futility: Did the negative results get stronger, or was it just noise? This distinction helps you decide whether to abandon or tweak the concept.
- For inconclusiveness: Were any interesting patterns cut off due to reaching the maximum exposure? These could still provide valuable insights.
Look at the qualitative feedback
Complement quantitative data with qualitative user feedback signals like ratings, survey comments, and support tickets.
Did the rating or reviews align with usage metrics, or were there unexpected contradictions to explore? What common pain points or delights emerged from open-ended feedback on the variations? Did specific user roles or workflows respond differently based on their goals?
Answering these questions gives you richer context around the “why” of the performance.
Explore different segments
Slice the data by key user segments, such as:
- New vs. returning users
- Mobile vs. desktop
- regions
- Customer tiers or subscriptions
- source or campaign
Segmentation helps you see if the optimizations had different impacts on different slices. From this, you can focus your deployment efforts or address any challenges.
Use supplemental data
Layer in relevant supplemental data sources to build a more comprehensive picture of your test.
This information might include:
- or pathways to isolate bottlenecks
- Heatmaps to expose UI or layout frictions
- Session recording or replays for deeper user journey context
- Behavioral cohorts and personas to interpret different motivations
- A/B test results from previous optimizations
The more pieces combined, the richer the overall narrative. This picture can help reveal why some solutions worked while others fell flat.
Document your learnings
Combine all your factors in one experimentation archive, keeping everything together so you can easily share the knowledge.
Highlight the top findings, insights, and recommendations. Come up with new open questions that you must pursue and estimate the impact on critical business metrics.
Detailed documentation also encourages learning and iterative testing. This inclusion helps you avoid repeated mistakes and launch more impactful product innovations.
Sequential testing and CRO
(CRO) is crucial for online businesses to turn visitors into customers and increase revenue. Even minor improvements can lead to significant gains. Sequential testing enhances traditional CRO by enabling faster, more efficient, and continuous improvement.
With quick tests and better traffic use, teams can make ongoing changes as and strategies change. You can also integrate the learnings from each test into the next cycle and keep the process dynamic.
Making product optimizations is crucial for hitting goals—sequential methods help you get the maximum returns from your experiments. This approach is essential for any brand aiming to refine its conversion processes and stay competitive.
Seamless sequential testing with Amplitude
Experimentation leaders should seriously consider sequential testing. The efficiency gains and faster insights make it appealing for any data-driven optimization program aiming for growth.
provides a seamless way to take advantage of sequential testing without separate complex statistical deployments. The methodology is baked right into the end-to-end experimentation in a straightforward, transparent way.
Rather than relying on traditional techniques like t-tests, Amplitude uses a mixed sequential probability ratio test (mSPRT) approach. This test continuously evaluates the likelihood ratios of the metrics tracked. Visual boundary charts clearly show the results of your stopping thresholds.
Use Amplitude to compare key metrics, including uniques (distinct counted users or accounts), averages (totals divided by uniques), and the sum of tracked properties. It covers many core use cases and the (KPIs) product teams care about when improving experiences.
Make more decisive, quicker changes that benefit your users. .