Group-bys: How Amplitude prunes and orders chart results
Amplitude Academy
Getting Started with Amplitude Analytics
Learn the most fundamental features of Amplitude Analytics, including cohorts.
Get startedIn its basic form, Amplitude's group-by feature is a tool for categorizing events for aggregation.
For example, when you want to count the number of events by country, use a group-by.
Group-by result limits
For performance, Amplitude sets a maximum limit on the number of groups a query result can return. When a query exceeds the limit, Amplitude keeps the top groups and prunes the remainder from the query result.
Amplitude may impose the following limits:
- Single group-by: 100 results.
- Double group-by: 500 results.
For more information, refer to Limits.
Group ordering
This table shows how Amplitude prioritizes group-bys for display in the Breakdown Table.
| Metric | Order |
|---|---|
| Uniques | Number of unique users |
| Totals | Total number of events |
| % Active | Number of unique users |
| Average | Total number of events |
| Frequency | Number of unique users |
| Distribution of Property Value | Total number of events |
| Sum of Property Value | Sum of property values |
| Average of Property Value | Sum of property values |
| Distinct Property Values Per User | Total number of (user, property value) pairs |
| Formulas: Percentile | Total number of events |
| Formulas: Frequency Percentile | Total number of events |
| Formulas: Property Count | Number of unique properties |
| Formulas: Property Count Average | Total number of (user, property) pairs |
| Formulas: Default | Number of unique users |
Formulas with group ordering
For formulas without a group-by, Amplitude only uses this ordering if every metric in the formula uses the same ordering. Otherwise, Amplitude uses the default formulas ordering.
For formulas with a group-by, Amplitude ranks the groups by the largest overall values per group, summed across all formulas in a single expression.
If group-by pruning occurs with multiple formula terms combined with operators, formulas may take longer to load. This is because Amplitude runs extra queries to ensure all formula terms query the same groups.
Group-by in Experiment
Beta
This feature is in Beta and may continue to evolve. This documentation may not yet reflect the latest updates.
Group-bys in Experiment charts may result in slower query performance. For more information, refer to Limitations.
In Experiment end to end and experiment results, Amplitude limits the number of group-by groups returned to 10 per metric. Amplitude sorts the rows by the sum of exposures across all variants. Some rows may show (none), which means the property is missing. For more information, refer to FAQ: Unexpected values in user counts. The group-by applies to the exposure event.
In multiple hypothesis testing, Amplitude doesn't correct for using a group-by because it doesn't know how many hypothesis tests you plan to do in the analysis. You could look at one group-by value or 10 group-by values.
Group-bys provide a more exploratory analysis. Adjusting for multiple hypothesis testing increases the difficulty of reaching statistical significance. If you think you have a false positive, split your experiment into two date ranges. Conduct all the hypothesis testing you want on one dataset, then try to reproduce those results on the second date range. Think of this like a train-test split, where you train a machine learning model and tune the hyperparameter on the training set, then evaluate the model on the unseen test set to get an unbiased error estimate.
Limitations
Amplitude doesn't support group-by and CUPED together. If you select these options, the non-group-by value has CUPED applied but the group-by values don't.
If the value of the property you group by changes between the exposure event and the metric event, you may notice a conversion rate higher than 100%. For example, if you group by Country and view the row Country = Spain, the denominator is the unique number of exposures in Spain and the numerator is people who did the metric event in Spain.
As a result, you may notice conversion rates greater than 100%, since someone can do a metric event in Spain but never do the exposure event in Spain. They count toward the numerator but not the denominator.
The opposite is also true, where the numerator gets undercounted instead of overcounted. If you group by Platform, look at the Platform = Web row, the exposure event has Platform = Web, and the metric event has Platform != Web, those metric events don't count in the numerator.
Was this helpful?