Amplitude has processed more than 31 trillion user events for more than 40,000 digital products.
So, you could say we’ve seen our fair share of strategies for managing incoming event data.
Over the years of working with customers large and small, it’s become clear that there exist three primary strategies for managing data in product analytics, and they align to a company’s maturity level:
- Strategy #1: Cleanup Workflow
- Beginner Maturity in Data Management
- Strategy #2: Approval Workflow
- Intermediate Maturity in Data Management
- Strategy #3: Event Planning Workflow
- Advanced Maturity in Data Management
Although companies generally move from one step to the next as they advance in maturity, these data management strategies are not mutually exclusive, and companies may use them with some overlap.
Generally, however, we’ve found that the teams that are newer to product analytics will typically opt to do mostly retroactive correction on data that is already in the system, while more mature organizations tend more towards up-front planning that ensures high functionality and reduces the amount of correction required.
Data Management Strategy #1: Cleanup Workflow
Maturity Level: Beginner
In many cases, organizations that are new to product analytics are eager to get any events—any events at all—into their systems. The organization doesn’t have much experience with deciding which events to put in, and therefore may just allow employees to add whichever events they choose. Event data is seen as inherently valuable, and every time an engineer is willing to instrument it is seen as a win. Teams at this level typically don’t have a dedicated analytics function or a formal data governor. It is possible that some product managers are carefully thinking about how to instrument their new features, but it is also possible that engineers are simply told to send in events, or in some cases they may even implement a system that automatically tracks clicks and injects relevant properties (similar to auto-tracking, which creates more problems than it purports to solve).
The main characteristic of this approach is that most of the data management occurs after the events are already live within a data analytics platform like Amplitude. Events often need to be renamed or combined, as they aren’t easily usable in the format they are ingested in. For example, “click header menu” might turn into “open settings,” or something that better describes the in-product action it represents. This makes it easier for someone unfamiliar with the data to understand which event to choose.
In addition, this sort of system typically has a lot of events that aren’t explicitly useful. For example, an engineer might instrument every action he or she can think of, only to find that they have inserted duplicate events. In some cases a company may hit their monthly event quota, only to find that a few events that provide little value are costing a lot of money, since analytics platforms often charge by the amount of data ingested.
Cleanup is also often triggered by a lack of clarity. Although an organization at the beginner level of maturity may not be sending many events, it can be difficult to determine the right event to instrument with the analytics platform. Over time, this necessitates renaming events or adding descriptions as the organization learns more about which events they want to track.
In most cases, the cleanup work is performed by one or more people who have a passion for analytics or data-driven culture, often as an after-hours function. They typically don’t have much time for this work, but the event volume is low enough to clean on an occasional basis.
Data Management Strategy #2: Approval Workflow
Maturity Level: Intermediate
At some point, the volume of inbound events becomes too great, and it isn’t possible to only manage events retroactively. In many cases, there is a motivating event; maybe there is a bug that creates thousands of unnecessary event types and takes hours to clean up, or possibly the company starts to realize the value of analytics while also seeing that their implementation is deeply flawed. The organization realizes that cleanup and correction, while still necessary, probably isn’t going to get them to the desired state. They are willing to invest more time into doing things “the right way,” although the resources are often still fairly limited. This is the point where an organization typically begins to transition into an “event approval” workflow.
At this point, someone decides to take on the mantle of data governor, or maybe is even drafted into the role by leadership. The data governor is tasked with keeping track of incoming events, and approving them on a case-by-case basis. At this stage, the organization often has some sort of data management style guide, which gives some ground rules for how to appropriately instrument events.
In some cases, the data governor approves events as-is, but they also have an opportunity to transform events or even to reject them outright. As mentioned earlier, data management strategies overlap. So while the organization may have advanced in maturity, they may still need to do a significant amount of cleanup/correction work (strategy #1). The big difference between this workflow and the cleanup/correction workflow is that the cleanup/correction is often done when the event first comes in, and not at some later point when it becomes obvious that there is a problem.
When we talk to data governors at organizations in this stage, they often tell us that they are hesitant to reject events outright, because it is unlikely that the engineers will take the time to re-instrument. In general, events are still seen as “precious,” and the organization does whatever it can to salvage an event. While it is likely that the team has set some ground rules—such as a style guide for events— adherence to these rules may be inconsistent.
The hallmark of this intermediate level is that there typically is a single data governor, versus no data governor in the earlier stage or multiple data governors in the later stages. This data governor is often a part-timer, and their existing role is typically helped by effective data management. For example, they could be the head of data science or a product manager who has worked with well-run analytics systems in the past. No matter the title, though, their time is limited. Each week the data governor does a little data maintenance work based on inbound requests, but does not have time to plan and execute cross-team instrumentation.
Data Management Strategy #3: Event Planning Workflow
Maturity Level: Advanced
Effective data governance requires a fairly intimate knowledge of the product, and as a company’s product offerings broaden, it is likely that a single data governor cannot be familiar with every aspect. At this point, the organization has advanced to the final maturity level for data management. The company invests in bringing a team together—often in the data science and/or analytics function—that is dedicated to thinking about product analytics and related problems.
This is the point when a company is ready to transition to an event planning workflow. Approval has gotten the team far enough, but it is obvious that for product analytics to really yield its maximum value, it must become a discipline. As such, the analytics team needs to carefully consider every event and property that is sent in. Each member of the analytics team is responsible for working with one or more product teams, and collaborates carefully with that team throughout the event lifecycle. When product managers and designers spec out product features, the analytics team works with them to determine exactly what to measure. They also carefully document the process. Then, analysts work closely with the engineering team during implementation, even going so far as to read pull requests to make sure that instrumentation is performed correctly. They then plan the events in a system such as Amplitude Govern, which ensures that only correct events are ingested. If an event does not match these specifications, it is typically rejected, and engineers will need to make changes. Far from seeing events as precious, these organizations view events as a usable tool that helps them learn more about user behavior.
These more mature organizations still do some level of cleanup/correction, although as in the case of the approval stage, this fulfills a somewhat different purpose. Since these organizations have a lot of events, typically they need to cull these. Events that may have been useful in the past may no longer be needed, or in the interest of reducing costs, they may need to block some events that turned out to be less useful in the long-term. They may still merge or rename events, but this is typically done as taxonomy standards evolve rather than in response to specific mistakes. Likewise, the analytics team may also do a limited amount of event approval, although it is likely that all events will be planned.
Overall, these three data management workflows are appropriate for teams at different levels of maturity. One interesting thing to think about is that the data management dysfunctions (anarchy and dictatorship) don’t purely come from using a particular strategy, but from using that strategy at the wrong time. Anarchy typically results from using a cleanup/correction strategy (or even an approval strategy) too late in the process, while dictatorship can come from using planning too early (typically before an organization has seen the value from product analytics).
In the prior posts of this three-part series, we discussed the four truths of data management and three keys to data functionality. Now, with this maturity model, you have a fuller understanding of how to plan events and make data more usable, accurate, and comprehensive for your entire organization.
The next step is to implement these strategies, and start getting more value from your data.