On this page

The Experiment Analysis view

The Experiment Analysis view shows the details of your experiment. Select your experiment and then go to Activity > Analysis to view high-level statistical measurements. These measurements help you determine whether your experiment succeeded.

This article describes each column in the Analysis table and how each column relates to your experiment.

Analysis filters and chart options

You can filter the information in the Analysis view by time, user, or any other available property.

Select Select property to filter the analysis view by any available property.

To turn your experiment analysis into a chart, select Open in Chart. Amplitude converts the experiment information into the most likely charts and opens them in a new tab. You can then modify the charts.

Metric name, Control, and status

The Metric name column contains the names of the selected metrics. The top metric is the recommendation metric. All other metrics are secondary metrics. Hover over a metric's name to read its definition.

The Control column contains information about the control group for your experiment.

The On column contains information about experiment activity while the experiment runs. If the experiment is complete or rolled back, the On column may show other relevant information such as treatment or property.

To examine segments of users, use the filter card at the top of the Analysis tab.

Relative performance

Relative performance measures the relative difference between variant performance and control performance. This is also known as the relative lift. To cross-check this value, expand a single metric's section and divide the absolute lift for a variant by the absolute value of the control for that metric.

Confidence interval

The confidence interval is a range of values that includes the parameter you're trying to measure. In this case, the parameter is the difference in the means between the variant and the control. The confidence interval isn't a probability. Interpret the confidence interval as follows: if you ran this experiment 100 times with the confidence level set to 95, you can expect the true parameter value to fall within this range at least 95 times.

The confidence interval reveals characteristics of what the experiment has observed so far:

  • Confidence Interval contains 0: Not enough evidence exists to decide whether a difference exists between control and treatment.
  • Confidence Interval greater than 0: The interval (upper and lower confidence bounds) is greater than zero. Amplitude Experiment has accumulated enough observations to reach statistical significance, and you can conclude that the variant has a positive effect compared to control. For example, when you look at lift, a variant with a confidence interval greater than zero performs better than the control.
  • Confidence Interval less than 0: Amplitude Experiment has accumulated enough observations to reach statistical significance, and you can conclude that the variant has a negative effect compared to control. When you look at lift, a variant with a confidence interval less than zero performs worse than the control.

If you have multiple variants, select the one you want to view in the confidence interval chart from the drop-down above the chart.

To access the confidence interval, hover over the control or the experiment metric.

Significance

Significance is the likelihood that the performance shown for each test variant differs from zero rather than reflecting random fluctuations in the data. The higher this value, the more confident you can be in your results. More formally, significance is 1 - p-value.

Absolute value

The specific meaning of the absolute value depends on the metric type. For unique conversions, Experiment expresses values as a percentage that indicates the percentage of users (over the total number of exposed users) who converted for each variant. The numerator (Conversions) and denominator (Exposures) appear below the percentage.

Otherwise, the value indicates the aggregate (total events, sum of property value, average of property value) for each exposed user. The denominator is the total number of exposures. For example, 10 total events / 4 exposures = on average, an exposed user had 2.5 conversion events.

To access the absolute value, hover over the control or metric name.

Winsorization statistics

When you enable winsorization for your experiment, the Analysis view shows additional information about how winsorization affected your data:

  • Number of winsorized users: The count of users whose values winsorization adjusted.
  • Percentage of winsorized users: The proportion of total users that winsorization affected.

This information helps you understand the impact of outlier handling on your experiment results. If winsorization affects a high percentage of users, investigate the underlying data distribution or adjust your winsorization settings.

Common questions

Why is my graph displaying an error state?

A common mistake is to generate a chart using only one variant. The Experiment Results chart must include two or more variants to display comparison results. If you don't include both the control and at least one variant, your chart doesn't display anything.

Why is reaching significance taking longer than it should?

When you use a T-test, you must wait until your experiment reaches the specified sample size before Experiment Results runs the p-value and confidence-interval computations.

With sequential testing, even with a large MDE, reaching statistical significance can take time if your experiment's lift is small. A T-test generally requires fewer samples to detect the same lift.

How is the Retention metric calculated?

Amplitude uses three parameters to calculate Return On for the Retention metric:

  • The starting event: the event that occurs after the exposure event. The starting event marks the beginning of the retention window.
  • The return event: the event you want the user to perform after the starting event. The user counts as retained if they trigger the return event.
  • Return on the nth day/week/month: the number of days, weeks, or months between the user performing the starting event and the return event. Amplitude calculates this parameter in 24-hour increments and doesn't use calendar dates.

For example, suppose a user performs an exposure event, a starting event, and a return event:

  • Exposure event: Page Viewed
  • Starting event: Sign Up
  • Return event: Add to cart

With a return-on nth day value of seven, the user counts as retained if Add to cart triggers between seven days and seven days plus 24 hours after Sign Up.

If the return-on value is an nth week value of one (one week, or seven days), the user counts as retained if they perform Add to cart any time between days seven (week 1) and 14 (week 2) after Sign Up.

For a return-on nth month value of one (30 days), the user counts as retained if they perform Add to cart any time between days 30 (month 1) and 60 (month 2) after Sign Up.

Was this helpful?