Sample Size Calculator
What is sample size?
Sample size refers to the number of observations or participants in a product or web experiment. Most surveys rely on a small subset of a given population—a “sample”—to conclude the whole. So, a researcher's “sample size” is the number of people in the survey group.
Sample size plays a crucial role in the reliability and accuracy of tests, ensuring that results effectively represent the population you're studying.
Why is sample size important?
Sample size shapes your experimental process and affects the validity of your results.
On the data side, your sample size determines how precise and reliable your results are. A survey run on a too-small sample can be more harmful than not running a survey at all because it can return incorrect data that might mislead you as you consider your next steps.
From the project management side, sample size matters because the bigger the necessary sample size, the longer the test will take to complete. Some tests may need a big enough sample size and have a small enough effect on your bottom line that you decide they’re not worth running.
What is the most effective sample size?
The ideal sample size depends on the type of test you’re running. A/B tests typically investigate whether a UI/UX change has a small but measurable impact on user behavior. Working with a large sample size helps you detect small changes and trust you’re not seeing false positives.
If you’re looking for qualitative data, such as from a customer satisfaction survey, a smaller sample size might suit you better because you’re looking at individual answers for details and nuance.
Of course, budgetary and time constraints also play into your sample size determinations. The most exact survey would touch everyone in your target population, but that would take far too long. For most teams working in sprints or following a roadmap, time constraints are likely to play a big role in determining what the “best” sample size is. Your team may determine it’s better to have a slightly lower confidence level for your test than to delay a big launch for three weeks while you gather more data.
Common sample size mistakes to avoid
Effective A/B tests start with the right sample size. Follow these guidelines when choosing yours.
Using a sample that’s too big
The bigger your sample size, the more exact your results are. However, a bigger sample size means a longer test period.
Consider the difference between examining 100,000 or 150,000 interactions with a specific feature you’re testing. With the latter sample size, you’d have more confidence that your experiment results were accurate to your audience, which is great.
But if you only expect around 10,000 interactions with that feature each day, that extra certainty would mean making your test 50% longer (15 days as opposed to 10). When you have a product to ship, that tradeoff may not be worth making.
Using a sample that’s too small
Limiting your sample size may seem more efficient in terms of timeline and budget, but a test group that is too small won’t return reliable results. You might make a change that works against your goals and key performance indicators (KPIs) or even change your roadmap based on incorrect or incomplete data.
Forgetting about timing
Because your sample size determines how long your A/B test will run, it’s smart to look at a calendar when choosing which sample size to go with.
Say you’re a PM for an HR tool hoping to update its reimbursements feature. You know some of your users log in daily, some visit weekly, and some need your tool only in the days directly before and after payroll processing. A strong A/B test would include all of these subpopulations.
That means running your test for at least a week and scheduling its duration to include the end of a pay period (typically mid-month or end-of-month). A too-small sample might prevent your test from reaching every segment of your user base.
Handpicking your sample
When you choose your sample manually rather than relying on randomization tools, you will likely get skewed results. Different segments of your user base likely interact with your product in different ways. Over- or under-selecting from each segment won’t give you an honest look at how the changes you’re making will go over with the entire population.
Amplitude makes it easy to randomize your A/B tests, enabling you to reach the right sample size without introducing bias.
How to calculate sample size
Calculating sample size involves considering several factors, including your confidence level, minimum detectable effect, and baseline conversion rate. Here are a few factors to consider when determining the ideal sample size.
Confidence level
The likelihood that your survey results correspond to sentiments among your population as a whole. This value is expressed as a percentage.
When conducting user testing, it’s advisable to aim for a confidence level of at least 90%. This would indicate that for every 10 times you ran this survey, nine would return a result that represents your entire population.
Confidence level is the same thing as p-value, though p-value is expressed as a decimal that represents the likelihood your test is wrong (a confidence level of 90% = a p-value of 0.1).
Confidence interval
The possible deviation between your survey results and the sentiments of the entire population. This value is expressed as a percentage—for example, your survey may tell you that 33% +/-3% of your users liked the new feature you added.
That means that had you surveyed the entire population rather than your sample, you could expect that between 30% (33% - 3%) and 36% (33% + 3%) liked your new feature. Your confidence level and confidence interval have an inverse relationship for a given sample size: the higher your confidence level (certainty), the smaller your confidence interval (precision).
Baseline conversion rate
The current conversion rate for the element you’re testing. This value is expressed as a percentage and is easy to find in tools like Amplitude’s Funnel Analysis. A lower baseline conversion rate means higher sample size requirements because you wouldn’t expect most visitors to convert in the first place.
Minimum detectable effect (MDE)
The smallest change your survey can detect is called the Minimum Detectable Effect (MDE). This value is expressed as a percentage of your baseline conversion rate. If your baseline conversion rate is 10%, an MDE of 2% would mean your test could detect a change of 0.2 percentage points (10% × 2%). This means your test would be able to detect any changes above 10.2% (10% + 0.2%). The smaller your target MDE, the bigger the sample size necessary to detect a change.
Population size
The total size of the population you wish to learn more about. This value is expressed as an integer. In A/B testing, your population size is the size of your user base.
How to use the sample size calculator
It’s easy to use Amplitude’s sample size calculator. The calculator asks for your desired confidence level, baseline conversion rate, and minimum detectable effect to determine the necessary sample size. The sample size our calculator returns will ensure your experiments are adequately powered to yield statistically significant results, providing reliable insights for decision-making.
Step one: Set your benchmarks
First, find your baseline conversion rate for the feature you’ll be testing. This number is easy to find in your Amplitude funnel analysis.
Step two: Determine the effect size you want to spot
From there, you’ll want to calculate your MDE. This is an easy calculation to make. Simply divide the target change in conversion rate by your baseline conversion rate and multiply that number by 100. For example, say you typically get 15% of your users to click the “free trial” button on a landing page, and to justify changing the page, you’d like to see 16% click it instead. That means your change in conversion rate would be 1% (16% - 15%)
The MDE calculation would be: 1% (target change in conversion rate) / 15% (baseline conversion rate) * 100% = 6.67%.
Step three: Select your desired confidence level
Finally, choose your confidence level. 95% (a p-value of 0.05) is standard for an A/B test. If you want to be extremely confident in your results, opt for 99% instead. On the other hand, if the sample size required for a 95% confidence level is too high for your timeline, opt for 90% instead. We don’t recommend a confidence level below 90% if you want to trust that your changes will have the desired effect.
When to use a sample size calculator
Sample size calculators are helpful for tests that require a high level of confidence in quantitative data. They’re great when you’re testing questions like, “Are users trying this new feature?” that have yes/no answers.
They’re less valuable when the question is, “How does our new feature help your business?” This requires a deep understanding of how the feature may impact users’ workflows or business operations.
Sample size calculators will always be helpful if you’re running A/B tests. For more casual surveys or those that rely on qualitative rather than quantitative data, you won’t need the kind of numbers this calculator returns.
Want to know more about A/B testing? Our A/B testing guide will help you build, execute, and interpret an effective test.