Estimate the duration of your experiments
The Duration Estimator helps you determine which experiment ideas are viable before you start building. Use the Duration Estimator to avoid running tests that may never reach statistical significance, and to prioritize experiments that can deliver results in a reasonable timeframe.
The Duration Estimator supports T-tests. Sequential testing, Bayesian methods, and multi-armed bandit methods aren't available in this workflow.
Open the duration estimator
In your experiment setup, select Estimate Duration to open the Duration Estimator.
When you first open the Duration Estimator, you see an empty state. After you add your traffic event and success metric, the Duration Estimator calculates how long your test needs to run.
Set up your estimate
Step 1: Add your traffic event
Select + Add Event, and choose the event that represents traffic where you run your experiment.
For example, if you test your homepage, select Page Viewed, and add a filter for your homepage URL.
The Duration Estimator pulls the last 29-30 days of traffic data from Analytics, and shows Users per day in the results panel.
If you don't have the right event, select Enter Manually to input your own total daily traffic estimate. Traffic is total traffic, not for each variant.
Step 2: Add your success metric
Select + Add Metric, and choose the conversion metric you want to improve with this experiment.
A success metric is the visitor action you're trying to change with your experiment. Think about what you want more visitors to do because of your changes.
Common success metrics:
- Conversions: Visitor completes a key action, such as signed up, purchased, enrolled, or subscribed.
- Form completions: Visitor submits a form or completes a flow.
How to choose:
- Ask:
What action do you want more visitors to take? - Look for metrics that match that action.
- If you see similar metrics, choose a conversion metric (which Amplitude marks as
Conversion of...) or a metric with an official blue badge.
Choose a metric your team uses often.
For example, if you test your homepage hero banner and want more visitors to enroll in a course, select Conversion of registration: course enrolled.
The Duration Estimator calculates your current conversion rate from the last 29-30 days of Analytics data, and shows the rate in the results panel (for example, 78.8% -> 82.8%).
If you don't see the metric you need:
- Search with the search bar at the top of the dropdown.
- Select Create Metric at the bottom to build a new one.
- Select Enter Manually to input your own baseline conversion rate.
Step 3: Set your minimum detectable effect (MDE)
The relative MDE is the smallest improvement you want to detect. The default is 5%, which means you test whether you can improve your baseline by 5%.
For example, if your baseline conversion is 78.8% and you set a 5% MDE, you test whether you can reach 82.8%.
How to think about MDE:
- Big, bold changes (like redesigning a hero banner above the fold): expect a large lift, such as 8%.
- Small, subtle changes (like changing button text below the fold): expect a small lift, such as 2%.
- Smaller MDEs require much longer test durations.
If you don't have historical data, select Enter Manually to input your own baseline conversion rate.
Understand your results
After you add your traffic and success metric, the Estimated Duration panel shows:
- Duration: How many days you need to run the test (for example,
~130 days). - Users per day: Daily traffic the Duration Estimator pulls from Analytics.
- Lift: Your baseline conversion rate to target conversion rate with your MDE percentage.
- Summary: Plain-language explanation you can share with stakeholders.
If your duration is very long, you see a Long Duration warning badge. Use the Duration Scenarios table to explore different scenarios.
Use the Duration Scenarios table to prioritize
The Duration Scenarios table is the most important part of the Duration Estimator. The table shows how your choices affect test duration, so you can make better decisions about what to test and when.
How to read the table
Rows (confidence level):
- Low (85%): Less certainty your results are real, but faster results.
- Medium (90%): Balanced approach (default setting).
- High (95%): More certainty your results are real, but takes longer.
- Custom %: Enter your own confidence level.
Columns (lift size/MDE):
- 2%: Small, subtle changes (takes longest to detect).
- 5%: Medium-sized changes (default setting).
- 8%: Large, bold changes (fastest to detect).
- Custom %: Enter your own MDE.
The table highlights your selected combination and shows durations for all other scenarios.
How to think about confidence level
Your confidence level is the risk you're willing to take with your results. Choose based on what's at stake.
95% confidence: Use when the cost of being wrong is high.
- Revenue-critical tests (checkout flows, pricing, and subscriptions).
- High-impact placements (homepage hero, above-the-fold content, and navigation).
- Sensitive or costly bets (brand-new features, compliance-heavy areas, and high-cost builds).
90% confidence: Use when you want balance between speed and reliability (default).
- Medium-stakes decisions where time matters, but the cost of being wrong is manageable.
- Engagement-focused outcomes (click-throughs, mid-funnel steps).
- Iterative improvements in areas with prior evidence.
85% confidence: Use when you need a directional signal.
- Early validation (MVPs, prototypes you follow up on).
- Low-stakes tests (low-traffic pages, below-the-fold changes).
- Well-understood areas where a topline read is enough.
How to think about MDE (lift size)
MDE reflects the expected impact of your experiment idea. Ask: How much lift do I realistically expect this change to drive?
Large MDE (8%+): Use for bold changes with dramatic impact.
- Prominent new CTAs at the top of the homepage.
- Major redesigns of key flows.
- Revenue-driving promotions.
- Because the effect is big, the test resolves quickly.
Medium MDE (3-5%): Use for meaningful but not dramatic improvements.
- UX enhancements.
- Layout adjustments.
- Copy changes.
- This is the most balanced choice for everyday experimentation.
Small MDE (1-2%): Use for subtle tweaks, or when tiny gains are valuable.
- Microcopy changes.
- Slight color adjustments.
- Incremental funnel optimizations.
- These require the most time and traffic, but can add up in mature, high-volume products.
Use the Duration Scenarios table for prioritization
Scenario 1: Your test takes too long
If your estimate shows ~130 days at 5% MDE and 90% confidence, review the table:
- At 8% MDE (larger change), duration drops to
~51 days. - At 85% confidence (lower certainty), duration drops to
~102 days.
Decision framework:
- Can you test a bigger, bolder idea to get results faster?
- Are the stakes low enough to justify 85% confidence for a quicker read?
- Or is this a high-stakes test where 90-95% confidence is worth the wait?
Scenario 2: Compare multiple test ideas
You have three test ideas in your backlog:
- Homepage hero redesign (expected 8% lift):
~51 daysat 90% confidence. - CTA button text change (expected 5% lift):
~130 daysat 90% confidence. - Footer link color change (expected 2% lift):
~632 daysat 90% confidence.
Decision: The hero redesign is viable and can deliver results quickly. The CTA change may be worth running if you lower to 85% confidence (~102 days). The footer change takes over a year, so it isn't worth testing now.
Scenario 3: Balance your testing portfolio
Use the Duration Scenarios table to create a balanced mix:
- High-impact tests (8% MDE, 90-95% confidence): One to two major tests for each quarter that resolve in two to four weeks.
- Everyday optimizations (3-5% MDE, 90% confidence): Regular tests that deliver steady improvements in three to six weeks.
- Quick validation (5-8% MDE, 85% confidence): Fast directional reads on new ideas before heavier investment.
Scenario 4: Low-traffic pages
If you test a low-traffic page and durations are very long across all scenarios, you may need to:
- Test on a higher-traffic page.
- Wait until you accumulate more traffic.
- Test something with a larger expected impact.
The Duration Scenarios table makes these trade-offs visible, so you can prioritize experiments that fit your traffic and timeline constraints.
Adjust advanced settings (optional)
Select Advanced Settings to access additional controls:
- Confidence level: Low (85%), medium (90%), or high (95%).
- Statistical power: Probability of detecting a true effect (default 80%).
- Rollout: Percentage of visitors you expose to the experiment (default 100%).
- Number of variants: Total variants including control.
- Distribution: How traffic splits between variants (default evenly).
- Statistical method: T-test.
Most teams don't need to adjust these settings. The defaults work well for standard A/B tests.
Tips for reducing test duration
If your estimated duration is longer than your timeline allows, use these options.
Test a bigger idea (increase MDE)
The biggest factor in test duration is the size of the change you want to detect. Larger changes produce larger lifts and resolve faster.
For example, moving from 5% MDE to 8% MDE can reduce duration from ~130 days to ~51 days.
Ask:
- Can you test a more impactful variation instead of a subtle tweak?
- Instead of changing button color, can you redesign the entire CTA section?
- Instead of tweaking microcopy, can you rewrite the entire headline?
Large-impact ideas resolve faster. Small-impact ideas take longer, but can add up in mature, high-volume products.
Lower your confidence level (when stakes allow)
Dropping from 90% to 85% confidence reduces duration, but increases false-positive risk (calling a winner when there isn't one).
For example, at 85% confidence, the same 5% MDE test takes ~102 days instead of ~130 days.
Ask:
- What's the cost of being wrong?
- Is this a low-stakes test (below-the-fold change, well-understood area)?
- Can you validate results with a follow-up test if needed?
Don't lower confidence for:
- Revenue-critical tests.
- High-impact placements.
- Brand-new features, or unknown customer segments.
Choose a higher-traffic page or event
Low traffic is a common reason tests take too long.
Ask: Can you run this test on a higher-traffic page, or choose a more frequent conversion event?
Increase rollout percentage
If you only expose 50% of visitors to the experiment, increasing to 100% can reduce duration by about half.
Reduce number of variants
Testing four variations takes much longer than testing two. Consider multiple sequential tests instead of one large multi-variant test.
Decide if the test is worth running
Sometimes a test isn't feasible. If the Duration Scenarios table shows hundreds of days across all scenarios, the test probably isn't worth building.
Common mistakes to avoid
- Defaulting to 95% confidence every time. This makes sense for high-stakes tests, but can slow low-stakes experiments.
- Chasing only small lifts. Looking for 1-2% MDE improvements can require large traffic and long run times.
- Skipping a duration check. Even with the right settings, some experiments can't reach significance with available traffic.
The Duration Estimator helps you make this call before you spend time and resources on a test that may never reach significance.
Common questions
Why does my estimate say Long Duration
Your test takes a long time to reach statistical significance, often because of low traffic or small MDE. Use the Duration Scenarios table to explore faster alternatives.
What if I don't have 30 days of historical data
Update the timeframe, or select Enter Manually to input your own traffic and conversion estimates. Results are most accurate with at least a few weeks of stable data.
Can I change the MDE after I see the estimate
Yes. Adjust the MDE percentage in the success metric section, and the estimate updates automatically. Use the MDE control to explore different scenarios before committing to your test design.
What does Last 29 days offset by 1 mean
This label shows the data timeframe the Duration Estimator uses for calculations. Offset by 1 means the calculation excludes today because today's data is incomplete, and looks at the previous 29 complete days.
Should I always aim for high (95%) confidence
No. Many experiments run at medium (90%) confidence, which balances speed and accuracy. Use high confidence when stakes are high, or when you need maximum certainty before a decision.
How do I choose a success metric
Start by asking What action do you want more visitors to take? Then choose a metric that matches that action. Conversion metrics (which Amplitude marks as Conversion of...) are often the best choice.
If you're still unsure, search for metrics related to your goal, or select Enter Manually to input your own baseline.
Where can I learn about live experiment duration estimates
Review Experiment duration estimates to understand the duration estimate that Experiment shows while an experiment runs.
Was this helpful?