# Personas FAQ

Frequently asked questions about Personas charts.

Source: https://amplitude.com/docs/analytics/charts/personas/faq

---

On this page

- [How does the Personas chart calculate clusters?](#how-does-the-personas-chart-calculate-clusters)
- [What is Non-Negative Matrix Factorization (NMF)?](#what-is-non-negative-matrix-factorization-nmf)

# Personas FAQ

This page answers common questions about the [Personas chart](/docs/analytics/charts/personas/personas-clustering).

## How does the Personas chart calculate clusters?

Amplitude previously relied on the [K-Means](https://en.wikipedia.org/wiki/K-means_clustering) algorithm to generate clusters for the Personas charts. This approach has two important limitations:

- It doesn't handle outliers well, so behaviors with large frequency ranges could skew the clusters toward representing unusual patterns of engagement. As a result, the clusters could fail to capture the nuance in more typical rates of behavior.
- It doesn't handle "high-dimensional" data well, so when customers have many different event types, the clusters could fail to represent groups of users who were truly similar in behavior.

For these reasons, Amplitude explored how to better use clustering to help customers identify meaningful patterns of user behavior in their data. Through that work, Amplitude replaced K-Means with [Non-Negative Matrix Factorization](https://en.wikipedia.org/wiki/Non-negative_matrix_factorization).

## What is Non-Negative Matrix Factorization (NMF)?

Amplitude uses NMF to calculate clusters. Given a data set, clustering algorithms look for ways to partition the set that maximize similarity within each partition while minimizing similarity between different partitions.

To escape the curse of high-dimensionality in the original "event space," NMF explicitly carries out a mathematical [dimension-reduction](https://en.wikipedia.org/wiki/Dimensionality_reduction) to arrive at a more comprehensible "behavior space." The method also reduces outlier effects by weighing events based on their frequencies and by normalizing each user's event counts. Once projected to the simpler behavior space, users similar along certain behavioral dimensions cluster together easily.

The number of dimensions in the behavior space matches the number of clusters specified. Because of this built-in connection, NMF clusters tend to be hierarchical.

To learn more about how NMF works, refer to the [NMF clustering paper](https://arxiv.org/pdf/1507.03194.pdf).

Was this helpful?

<!--$-->

<!--/$-->
