This article covers some frequently asked questions about the Personas chart.
Previously, Amplitude relied on the K-Means algorithm to generate clusters for the Personas charts. However, this approach has two important limitations: For these reasons, we began exploring how we could better use clustering to help customers identify meaningful patterns of user behavior in their data. Through that work, we ultimately decided to replace K-Means with Non-Negative Matrix Factorization.
How does the Personas chart calculate clusters?
Amplitude utilizes NMF to calculate clusters. Given a data set, clustering algorithms look for ways to partition the set that allow the similarities within each partition to be maximized, while simultaneously minimizing similarity between different partitions. To escape the curse of high-dimensionality in the original "event space,” NMF explicitly carries out a mathematical dimension-reduction to arrive at a more comprehensible “behavior space.” Moreover, the method diminishes outlier effects by weighing events based on their frequencies and by normalizing each user’s event counts. Once projected to the simpler behavior space, users who are similar along certain behavioral dimension will easily cluster/group together. Note that the number of dimensions in the behavior space is exactly the number of clusters being specified; because of this built-in connection, NMF clusters tend to be very hierarchical. If you are interested in learning more about how NMF works, please see this article.
What is Non-Negative Matrix Factorization (NMF)?
Thanks for your feedback!
October 29th, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.