Confidence intervals explained

Understanding Confidence Intervals and How to Calculate Them

Discover how confidence intervals improve product decisions. Learn their role in experiments, how to interpret them, and why they're vital for reliable results.

Table of Contents

What is a confidence interval?

A confidence interval is like a best guess with error bars. When you can’t measure something exactly (like how all your users will behave on your website), you take a sample and make an educated guess that shows what would happen if you were to experiment on that sample again.

In this case, that guess is a range that comes in two parts:

A range of values—for example, “between 10% and 15%”
How sure are you about this range—like “95% confident”

The second part is your confidence level.

We can think of confidence intervals like a weather forecast. Instead of saying, “It will rain exactly 0.5 inches tomorrow,” a meteorologist might say, “We’re 90% sure it will rain between 0.3 and 0.7 inches.”

This approach is more honest and useful than a single number because it shows you’re aware of any uncertainties while providing valuable decision-making information.

Confidence intervals in product and web experimentation

In product and web experimentation, confidence intervals help you understand the reliability of your results when you can’t test every user or scenario. You can use it to determine if your changes work. Do they genuinely improve and performance? How certain are you?

Let’s say you’ve changed the color of your “Buy Now” button from blue to green. You can use a confidence interval to see whether this tweak increased the number of clicks.

Rather than stating, “The green button got 12% more clicks,” you might report you’re “95% confident the green button will increase clicks by 8% to 16%.”

This approach is useful for several reasons:

It shows the likely range of improvement, not just a single number, giving you a clearer picture of possible outcomes
It tells you how sure you are about the results and lets stakeholders know how reliable your findings are
It helps you decide if the change is worth keeping or needs more testing

Confidence intervals are particularly valuable in , where you compare two versions of something. They tell you which version performed better and the range in which the difference (how much it won by) likely falls.

A confidence interval is also great for:

Estimating how much a feature might increase revenue
Predicting how many users might adopt a new tool
Comparing the performance of different user segments

By using confidence intervals, you acknowledge there’s always some uncertainty in your , leading to more nuanced reporting and better decision-making. It prevents you from overreacting to small changes that might be down to random chance while helping you spot significant improvements.

Confidence interval vs. other statistical measures

You’ll often see confidence intervals used or compared with other statistical terms and measurements.

Understanding the differences between confidence intervals and other popular concepts is important for choosing the right tool for your analysis and interpreting your results more accurately.

Confidence level

The confidence interval is the net you cast to catch the actual value. The confidence level is how sure you are that your net will see it—or the trustworthiness of your interval.

If you say, “I’m 95% confident sales increased between 10% and 15%,” the 10-15% is your interval, while 95% is your confidence level. The higher the confidence level, the wider your net usually needs to be.

Hypothesis testing

Hypothesis testing helps decide if there’s enough evidence to support a claim by answering a yes or no question—such as, did your change have an effect?

Confidence intervals provide the full story. The measures tell you if there’s an effect and how big it might be. We can use confidence intervals to support hypothesis testing, as they give you more information about your results.

Prediction intervals

If confidence intervals are about understanding the past and current values, prediction intervals are concerned with forecasting the future. The ranges are usually wider because they account for the uncertainty in your estimate and natural variability in future data.

You can use prediction intervals when you want to predict where a single future value might land, like next quarter’s sales. In short, we mainly use prediction intervals for scenarios we can’t “test” in the traditional scientific sense.

Credible intervals

Confidence intervals use frequentist statistics, while credible intervals use Bayesian statistics.

The two approaches have many complex differences, but the main one to remember is that Bayesian enables you to incorporate prior knowledge or beliefs.

If you’ve run similar experiments before, you can use the information you gathered to refine your estimates. Credible intervals let you draw on past experiences to make a more educated guess.

P-values

and confidence intervals are two sides of the same coin. A p-value tells you the probability of getting your results if there’s no effect (the null hypothesis). In other words, it tells you if your result could be due to chance.

Confidence intervals show the likely range of the real value, letting you know which effects are plausible.

Many researchers prefer confidence intervals because they’re more intuitive and provide more actionable information. P-values can also be trickier to interpret.

Standard deviation

If your data points were stars, the would tell you how spread out they are in the sky. The calculation is a measure of variability in your actual data.

Confidence intervals use this information but take it further by estimating how precisely you know the average position of all those stars.

Standard error

goes a step above standard deviation. While standard deviation measures the spread in your data, standard error measures how much your sample statistic (like an average) might vary if you repeated your sampling.

This calculation is crucial for building confidence intervals—the smaller the standard error, the narrower the confidence interval.

Confidence interval benefits

We’ve already highlighted the benefit of using confidence intervals—helping you make smarter, more informed decisions.

But what other specific advantages do they offer, and how do these support that ultimate goal?

Builds trust with stakeholders

By presenting a range, you acknowledge the uncertainty in your data, which is always present. Instead of concluding, “Our new design generated 500 more conversions,” you might say, “We’re 95% confident the increase is between 8% and 12%.”

This honesty may not apply to every scenario, but (in general) it helps to build credibility and trust with your stakeholders. A range is more transparent and realistic than a single number.

Gives a clear risk assessment

The interval shows you the best and worst-case scenarios. For instance, if a new checkout process might change conversion rates from -1% to +5%, you can better assess if the potential gain is worth the effort and risk. You know what you expect, making aligning team expectations and planning resources easier.

Enables easier comparisons

When comparing different product options, overlapping confidence intervals quickly show if there could be a significant difference.

Say version A has a of 5%-7%, and version B is at 6%-8%. The overlap suggests the difference might not be meaningful, saving you from overinterpreting small variations.

Provides insight into sample sizes

Wide intervals often indicate you need more data. If your confidence interval for a new feature’s impact is -10% to +30%, it’s a clear sign you need a larger sample size to make a well-informed decision. This information can guide your testing strategy and how you use your resources.

Improves team communication

Ranges can be easier to understand and discuss than exact figures or probabilities. Telling your team, “We’re 95% sure that between 20% and 25% of users will adopt this feature in the first month,” gives everyone a good idea of what to expect, making it easier to plan and set realistic goals.

Avoids over-reaction

When it’s easy to get excited about any positive change, confidence intervals keep you grounded. A 2% increase might look good, but if your interval includes zero or negative values (say -1% to +5%), it reminds you that the actual effect could be negligible or even slightly negative.

Incorporating confidence intervals also demonstrates you’re approaching problems scientifically—you understand the complexities of data and aren’t jumping to conclusions based on limited information. This approach can boost your credibility and lead to more defensible findings.

Supports iterative testing

Confidence intervals help you decide when to keep testing and when you have enough information to act.

If your intervals are too broad to be helpful, you can take that as a sign to keep iterating—tweak your feature, refine your experiment, and go again. When the results narrow to a range that supports decision-making, you know you can move forward.

Are incredibly versatile

Confidence intervals work for all sorts of metrics, whether looking at click-through rates, customer lifetime value, or time spent on a page. This consistent approach makes comparing and prioritizing different aspects of your product or website easier.

Helps with forecasting

They might not be crystal balls, but confidence intervals can give you a range of likely future outcomes to plan around. For example, if you’re forecasting next quarter's sales, an interval about how a new feature or update will likely perform helps you set appropriate targets and prepare for various scenarios.

When to use confidence intervals

Confidence intervals aren’t just for statisticians. They’re practical tools that can help anyone make better, data-informed choices. Knowing the range of likely outcomes gives you a more complete picture of what your experiment data is telling you.

You can use confidence intervals to:

Support the results of your A/B tests, helping you decide if the difference is big enough to matter and roll out.

Figure out how many users you need to test—smaller samples give wider intervals, which might not be precise enough for your needs.

Easily and honestly share experimentation findings with your team and stakeholders.

Weigh up the potential outcomes of a major business decision. If the entire interval shows a positive effect, you can be more confident going forward.

Analyze how different user groups behave—are the differences meaningful or random?

Track if changes in your key represent actual shifts or just normal fluctuations.

See how close you are to the true value when you need more precision, like a conversion rate for a high-stakes feature.

Generally, it’s good practice to use confidence intervals when making decisions based on data. These scenarios might include examining the adoption rate of a new feature, predicting churn, and quantifying the impact of page load time. Additionally, the intervals could analyze campaign performance and estimate potential revenue impact.

In situations like these, a confidence interval helps you reach nuanced conclusions—like a solution safety net.

How to calculate confidence intervals

Most tools can calculate confidence intervals for you. The important part is understanding what the interval means and how to use it.

Nevertheless, it’s still a good idea to be aware of the math and process used to get there:

Start with your sample: First, you need data. Let’s say you’re measuring click-through rates on a new button design.
Calculate the mean: Add up all your values (number of clicks) and divide by the number of samples (page impressions)—this gives you your point estimate.
Find the standard deviation: This value measures your data spread. Most spreadsheets or can calculate it for you.
Determine your confidence level: Most teams use 95%, but you might choose 90% or 99%, depending on your needs and what you’re testing.
Look up the z-score: This number is based on your confidence level. It measures how many standard deviations a data point is from the data’s set mean. For 95%, the z-score is 1.96. For 99%, it’s 2.58. You can use spreadsheet tools or to get this score.
Calculate the margin of error: This value shows how much the results of your experiment vary from the true value. You use the formula (z-score * standard deviation) / √ (sample size) to calculate it.
Create your interval: Take your mean and add or subtract the margin of error. The resulting confidence interval signifies the range where you expect the actual click-through rate from your new button design to fall.

Here’s what that process might look like in action:

Mean click-through rate: 5%
Standard deviation: 2%
Sample size: 100
Confidence level: 95% (z-score = 1.96)
Margin of error: (1.96 * 2%) / √100 = 0.392%

So, your 95% confidence interval for the new button would be 5% ± 0.392%. Or 4.608% to 5.392%.

If your sample size is small (usually less than 30), use a t-distribution instead of a z-score. T-distributions are similar to z-scores in what they measure but have thicker tails to account for smaller sample sizes.

However, you’ll likely have enough data for most web and product experiments to stick with the more straightforward z-score method.

Best practices

Confidence intervals aid your decision-making—they shouldn’t make decisions for you. Use them to inform your choices, but always consider other factors (like business knowledge and common sense) too.

Considering these practices ensures a more well-rounded view of your experiment results and that you use confidence intervals to their full advantage:

Stick to the same confidence level (such as 95%) across your project for easy comparison.

Resist the urge to only report results with narrow intervals. Be open about all findings.

A result isn’t always practically important. Use business context to interpret intervals.

Visualize when possible—graphs with error bars can make confidence intervals easier to understand.

In ongoing tests, recalculate intervals as you gather more data. Watch how they narrow over time.

Explain the interval in plain language when presenting your results—not everyone understands statistics.

Which confidence level should you choose?

95% confidence levels are the most common and a good default for most situations. This percentage strikes a balance between being reasonably sure (95% of the time, you’ll catch the actual value) and not being too broad.

For more critical decisions or when you need extra certainty, consider 99% levels. If you’re okay with more uncertainty in exchange for a narrower range, 90% might be your go-to.

The wider your interval, the more confident you are—but the less precise your estimate becomes. Choose a confidence level based on your specific needs, the risks involved, and how much uncertainty you (and your stakeholders) are comfortable with.

Interpreting confidence intervals

In the interpretation stage, product teams can flex their knowledge and explore what the data tells them.

You need to be able to look at the results logically and decide how to use this information alongside what you already know about your product and users.

The basics

The first thing to wrap your head around is that a 95% confidence interval doesn’t mean there’s a 95% chance the true value is in that range. It means that if you repeated your experiment many times, about 95% of the intervals would contain the real value.

Width

A narrow interval suggests more precision estimates. If your interval for conversion rate improvement is 2%-3%, you can be more certain about the effect than if it’s 0%-5%. Wider intervals usually mean you need more data.

Overlapping intervals

Most teams won’t declare a winner if the intervals overlap. For example, if Version A is 5%-7% and Version B is 6%-8%, the overlap suggests the difference might not be significant.

Though this is often the case, it isn’t a hard and fast rule. You should read it instead as a ‘red flag’ that the difference might not be meaningful. Reading the results through the lens of your business knowledge is crucial here.

Including zero

If your interval includes zero (like -1% to +3%), it suggests the effect may not be real or could even be negative. Be cautious about claiming an impact in these cases.

Extremes

Look at both ends of your interval. If even the lower end represents good news and a worthwhile improvement, you can be more confident in implementing a change.

Trendspotting

In ongoing experiments, look at how your intervals change over time. Are they narrowing as you collect more data? Are they shifting in a particular direction? This analysis can give insights into the stability and direction of your results.

Practicality

Just because an interval doesn’t include zero doesn’t mean the effect is practically important. A 0.1%-0.2% improvement may be statistically significant but not worth acting on. The effect could be too small to matter for your business goals.

Decision thresholds

Preset levels of improvement, called decision thresholds, guide your choices based on confidence intervals. For example, you may implement changes only when the entire confidence interval shows at least a 2% improvement in your key metric.

Thresholds help you focus on the changes that are more meaningful to your product and business and avoid acting on small or uncertain effects. You can customize where your results need to be before taking action.

Context

Finally, always interpret intervals within your business context. A 1%-2% improvement in click-through rate might be huge for a major site but negligible for a small blog.

Combining statistical insights with what you know about your product, customers, and industry (alongside a dose of common sense) will help you take the most effective actions from your results.

Visualize your confidence intervals with Amplitude

Confidence intervals provide a valuable tool for estimating the true impact of your changes, assessing risks, and communicating results clearly to stakeholders.

By using intervals, you’re taking a more nuanced, scientific approach to product improvements. You recognize the precariousness while still making informed decisions.

But knowing how to calculate and interpret confidence intervals is just the beginning. To properly incorporate data into your product development process, you need an tool that can handle complex experiments and quickly deliver actionable insights.

and analytics platform makes setting up tests, analyzing results, and visualizing your confidence intervals easy.

Easily run A/B tests and multivariate experiments
Automatically calculate confidence intervals for your key metrics
See the results in clear, intuitive graphs with error bars or shaded regions around the point estimates
Segment your data to uncover deeper insights
Make data-driven decisions faster with

From a startup looking to refine its (MVP) to an enterprise aiming to improve user experience across multiple products, provides everything needed to experiment with confidence.

Make changes backed by data and powered by confidence intervals. .

Insights

Action

Data

Insights

Action

Data

Industry

Use Case

Team

Size

Industry

Use Case

Team

Size

Learn

Connect

Support & Services

Tools

Learn

Connect

Support & Services

Tools

Insights

Action

Data

Insights

Action

Data

Industry

Use Case

Team

Size

Industry

Use Case

Team

Size

Learn

Connect

Support & Services

Tools

Learn

Connect

Support & Services

Tools

Understanding Confidence Intervals and How to Calculate Them

What is a confidence interval?

Confidence intervals in product and web experimentation

Confidence interval vs. other statistical measures

Confidence level

Hypothesis testing

Prediction intervals

Credible intervals

P-values

Standard deviation

Standard error

Confidence interval benefits

Builds trust with stakeholders

Gives a clear risk assessment

Enables easier comparisons

Provides insight into sample sizes

Improves team communication

Avoids over-reaction

Supports iterative testing

Are incredibly versatile

Helps with forecasting

When to use confidence intervals

How to calculate confidence intervals

Best practices

Which confidence level should you choose?

Interpreting confidence intervals

The basics

Width

Overlapping intervals

Including zero

Extremes

Trendspotting

Practicality

Decision thresholds

Context

Visualize your confidence intervals with Amplitude