# Build a prediction

Source: https://amplitude.com/docs/data/audiences/predictions-build

---

On this page

- [Build a prediction](#build-a-prediction)
- [Analyze your prediction](#analyze-your-prediction)
- [Feature importance](#feature-importance)
- [Model performance](#model-performance)
- [Tips for a good predictive model](#tips-for-a-good-predictive-model)
- [Build a cohort from your prediction](#build-a-cohort-from-your-prediction)
- [Analyze your predictive cohort](#analyze-your-predictive-cohort)

# Build a prediction

Predictions allow you to segment your users based on their likelihood to perform specific events or actions in the future.

For more prediction context, read [Predictions: Use Amplitude's AI to help maximize lift](/docs/data/audiences/predictions) and [Use prediction-based cohorts in your campaigns](/docs/data/audiences/predictions-use).

## Build a prediction

To build a prediction in Amplitude Activation, follow these steps:

1. Navigate to *Users > Predictions*. Then click *+ Create Prediction*.
2. Click *Start with All Users* to apply this prediction to all users active in the last 90 days, or define your own starting cohort.
3. If you choose to define your own starting cohort, select the users to include in the cohort. Under *Define starting cohort*, select the events, properties, or statuses that users in your cohort share.
4. Specify the action you want the starting cohort to take. Under *Define future outcome*, specify the events you want users to fire or avoid firing, the properties you want them to have after taking an action, or some combination of all three. Specify the time frame for the future action.

You can also think about a prediction as a **cohort transition**: you’re predicting the relative likelihood of a user moving from Cohort A, the starting cohort, to Cohort B, the future outcome, in the coming week.

5. To add optional settings, use *Advanced Model Configuration*. This section lets you further define your starting cohort by including or excluding specific user properties. Click *Add Feature* under *Include* or *Exclude* to search for user properties. Click *Save*.
6. Give your prediction a name and add a brief description. Then click *Save*. It takes about an hour for Amplitude Activation to build your prediction. Amplitude sends you an email when the prediction is ready.

## Analyze your prediction

After Amplitude Activation finishes building your prediction, review the results. Depending on the results, save the prediction as a cohort or start over with a new prediction.

1. To view the results of your prediction, navigate to *Users > Predictions*. This shows you a list of all the predictions created so far.

2. Find your prediction and click it to open the prediction explorer's *Audience analysis* tab. This tab shows the distribution of all users in your starting cohort:

   - The Y-axis shows the likelihood that a user converts, such as arriving at the future outcome you specified earlier.
   - The X-axis shows the percentile of users.

You can select a range of users by percentile and review how many users fall in the range, the predicted conversion rate of users in that range, and the likelihood of conversion for those users relative to the average.

Percentile and probability aren't the same thing. If you select the 80% - 100% percentile range, this doesn't mean the users in it have an 80% - 99% probability to convert. Instead, it means they’re in the top 20% of users, as ranked by probability to convert.

### Feature importance

Predictions need context to be useful. Amplitude ranks the events and user properties that are most important to your predictive model in the *Feature Importance* section below the *Audience definition* chart.

The *Events* tab in *Feature Importance* houses a table with the following columns and insights:

- The *Ratio* column ranks events or properties by importance to the model. Amplitude calculates this ranking by comparing the percentage of users in the selected percentile range who fire an event to users outside that range.
- The *Event* column lists the ranked events.
- The *% in Range* column specifies the percentage of users in the selected percentile range who fired the respective event. Sort by this column to rank events according to total level of engagement.
- The *% not in range* column calculates the percentage of users who performed the event but aren't in the selected cohort.
- The *Frequency* column displays the average number of times a user in the selected range fires an event. Sort by this column to rank events according to total level of engagement.

Click *Showing Significant Events* to narrow down your list of ranked events and modify which events you see in the table, or click the calendar icon to change the specified time frame and offset. The modified time frame applies to both the *Events* and *User Properties* tabs.

The *User Properties* tab shows rankings of user properties and their importance to the prediction's model. Click *Select Property* to choose a user property for review. Like the Events table, the *User Properties* table ranks property values by *Ratio*, *% in range*, and *% not in range*. To include hidden property values in the table, select **Show Hidden Property Values**.

### Model performance

At this point, you can evaluate the accuracy of your prediction. Amplitude provides metrics for you to do this in the *Model performance* tab:

- **True Positive Rate**: The ratio of predicted users who convert.
- **False Positive Rate**: The ratio of predicted users who don't convert.
- **AUC**, or Accuracy: The area under the curve, a measure that weighs both true positive and false positive rates.
- **Log loss**, or predicted vs actuals: The difference between predicted conversion rates and observed historical conversion rates, in percentage terms.

Generally speaking, a good model has an accuracy of at least 70%. Any model with an accuracy of 50% or less is no better than a coin flip in its predictive ability.

### Tips for a good predictive model

- **Outcome event has at least 50 unique actions per day**: Ensure that the outcome event has at least 50, and ideally 100, unique user actions every day. If conversion numbers fall below this threshold, the model doesn't detect signals and recommends the same content for all users.
- **Targeted user cohort for Audiences has between 1K and 10M users**: If the user population is under 1K, Audiences can't use users at a 1:1 level. If the population is greater than 10M users, Amplitude filters the cohort.
- **Large enough dataset**: More data improves recommendation accuracy. Amplitude requires a minimum of 10,000 events per user to build a prediction.
- **Relevant events in Amplitude related to the target outcome**: Cross-reference the target outcome and confirm that you already instrumented the appropriate events. These events are typically `Purchase` or `Transaction completed`.

## Build a cohort from your prediction

After you have a useful prediction, you can save the prediction as a cohort. This lets you return to the cohort and use it repeatedly in targeting campaigns.

To save your prediction as a cohort, follow these steps:

1. Use the slider to select the percentile range on the chart. Then click *Save as predictive cohort*.
2. Give your cohort a name and click *Save*.

Avoid splitting the starting cohort into only two sections, such as top 20% vs bottom 80% or top 50% vs bottom 50%. Other approaches can give you more useful results:

- **Probability inflection**: Find the spot where the distribution graph spikes exponentially, and split users along the spikes. This groups users into broadly similar buckets of predicted conversion rates.
- **Sample size**: If you know how many users you want to target in a growth campaign, select that percentage on the right side of the chart. For example, if you want to target 2,000 users and you have 20,000 users in the starting cohort, select the top 10%.
- **Minimum detectable lift**: If you plan to target the selected users in a growth campaign, make sure the sample size is large enough to detect incremental lift. For example, if the top 20% of a prediction is 20,000 users, but the predicted conversion rate is 1%, you won’t be able to detect lift at statistically significant levels. Instead, you must increase the sample size to top 45% of users at 45,000 users.

When a user’s probabilities change, Amplitude Activation automatically adjusts their cohort membership if they move into or out of the selected percentile range.

## Analyze your predictive cohort

After you save a prediction as a cohort, you can use it for analysis in any Amplitude Analytics chart. Try these analyses with prediction-derived cohorts:

- **Create top 20% and bottom 80% cohorts** to compare the best and worst users. Set them as different segments in the right module of any chart.
- **Event Segmentation**: Analyze the historical behavioral trends of best users vs worst users before converting.
- **Pathfinder**: Identify the different action sequences users take when they have a high likelihood vs low likelihood to convert.
- **Composition**: Break down cohort property values to compare user properties, such as which countries the best users and worst users are in.
- **Engagement Matrix**: Compare the events fired by the best users vs the worst users, based on the balance of frequency and percentage of users.
- **Funnel**: Compare relative conversion rates for any sequence of actions between the best users and worst users.

Was this helpful?

<!--$-->

<!--/$-->
