Pruning and ordering of data in Amplitude Analytics
In data analytics, data pruning refers to the process of removing or reducing the size of a dataset by eliminating irrelevant, redundant, or low-value data. The goal of data pruning is to streamline the dataset and make it more manageable, efficient, and meaningful for analysis.
Ordering, also known as sorting, refers to the process of arranging data in a particular sequence based on one or more criteria. Ordered data makes patterns, trends, and relationships within the dataset easier to identify.
This article explains how Amplitude Analytics performs pruning and ordering in its charts.
Why Amplitude Analytics prunes and orders data
Amplitude Analytics prunes and orders a chart's data whenever a group-by clause returns an excessive number of values. Imagine a chart that groups by user IDs. Since these are distinct values, this query returns every user ID in the segment you're analyzing. Including all of these values on a single chart makes it difficult to read and almost impossible to gain value from.
To maintain chart performance in these cases, Amplitude prunes values from the chart. For charts with one group-by clause, you can view a maximum of 100 values. For charts with two group-by clauses, you can view a maximum of 500 values.
Read more about the group-by and how it affects pruning and ordering.
Amplitude Analytics also orders chart results by displaying the top values only. Analytics compares user activity in the chart's time-frame to determine which values are the top values.
For example, imagine you have one chart looking at users over the last 30 days, and another looking at the last seven days. By definition, a top active user has triggered the event more often than other users. So for each of these time frames, the top users are likely different, since a top active user in the seven-day time frame might have been less active (either overall or relative to others in that larger user population) over the 30-day time frame.
Amplitude Analytics prunes results before applying any other filters on the chart. This includes cohorts.
View your pruned results
There are a few ways to view the query results pruned from a chart:
- Apply more filters: This narrows down the pool of results and surfaces more of the values you want to view.
- Export your results: Through the .CSV download (maximum 10,000 values) or the Dashboard REST API (maximum 20,000 values).
- Use a single column in Data Tables: When grouping by a property in a Data Table, using one group-by column (rather than multiple) gives you the fullest result set. With a single column, Amplitude can rank using the actual metric instead of an approximation.
Chart-specific considerations
Event Segmentation
- When viewing the Uniques tab in an Event Segmentation chart, all users display once and only once. There are no top users to surface.
Funnel Analysis
- If Amplitude Analytics has pruned users on your Funnel Analysis chart, the conversion rate might seem higher than expected. This is because it's based on fewer users.
Data Tables with persisted properties or attribution
When your Data Table includes persisted properties or attribution metrics (like last-touch), the ranking logic works differently from standard group-bys, which can make results feel off.
How standard ranking works: For a plain group-by with a standard metric (like Uniques or Totals), Amplitude ranks groups using the same metric it displays. The top 100 rows are genuinely the top 100 by the metric you care about.
How ranking works for persisted properties and attribution: Running the actual computation—assigning last-touch credit, or resolving a persisted property across a user's full event history—is expensive. Amplitude can't afford to run it just to decide which values to keep. Instead, Amplitude substitutes a cheaper proxy. It ranks groups using a plain event segmentation on "Any Event" with totals, asking "which property values appear most often across all events?" rather than "which values rank highest in my actual metric?"
This means the values Amplitude keeps may not be the ones you'd expect. Amplitude may exclude property values that rank highly in your actual metric because they're less common across all events, while including common-but-low-value groups.
Practical implications:
- Totals may appear lower than expected if Amplitude pruned high-value rows from the result set.
- Reordering columns (sorting by a different column) triggers a recomputation and can surface different rows. Expect this behavior.
- You can't view exactly what Amplitude pruned. Identifying pruned values requires running the full computation that pruning avoids.
Get the most complete results
Use a single column (one group-by) in your Data Table when working with persisted properties or attribution metrics. With a single column, Amplitude can use the actual metric for ranking instead of a proxy, giving you the most accurate and complete result set.
Was this helpful?