A Tale of Two Data Teams

A closer look at two ways that companies handle data governance—and why the optimal solution lies somewhere in the middle.

Best Practices
August 8, 2019
Image of Dana Levine
Dana Levine
Former Product Manager, Amplitude
A Tale of Two Data Teams

As a product manager, I end up talking to a lot of our customers and learning about how they go about the process of product development.

My goal is to make it easier for companies to get up and running on Amplitude, so I spend a lot of time thinking about how our customers get data into our product and make sure that the data quality is good so that people can do analyses. As such, I have heard a lot about different styles of data governance, which vary greatly from company to company.

Here is a story of how two different teams do data governance in vastly different ways, and how they can learn to more effectively govern data.

Company A: Free-for-all Analytics

Company A, who is a large enterprise with a complex set of products, wants as many people as possible to get product data into Amplitude. As their Director of Product (who is also the one who purchased Amplitude) told me, “it’s pretty much the wild west here. We want to throw open the gates and get adoption.” So they allow people to log any events and properties that they want with very little oversight. This has allowed a lot of teams to put whatever events they want into Amplitude.

Company A allows people to log any events and properties that they want with very little oversight.

There are a number of upsides to this approach. First of all, each team is able to log the events and properties that they care about in Amplitude, which means that the data in there will be relevant to them. As they can set whatever user properties they want, it is also easy to run lots of A/B Tests with different conditions.

However, this sort of approach also has some significant downsides. They have an almost unmanageable number of events, which makes it difficult for anyone to find anything. There are a wide variety of different conventions in event names, and some events seem to follow no conventions at all (there is even one event called “test”).

Company A has almost unmanageable number of events, which makes it difficult for anyone to find anything.

Company A also has a lot of trouble staying below their quotas. They are currently over the limit on the number of event properties that they can send in. A number of times, users have sent in events that threatened to take them over their monthly quota, and they have had to scramble to figure out what caused the problem. The Director of Product who serves as the data governor was the original champion, and he ends up frequently having to clean things up, which is stressful for him.

Company A is in a state of what we call “Information Anarchy.” Their project is kind of a free-for-all, and although there is a lot of value in there, there are a lot of challenges to actually realizing that value.

Company B: Highly Controlled Analytics

On the other hand, we have Company B, who is also a large enterprise. Instead of being a free-for-all, they have very tightly controlled everything that goes into their analytics system. Every time that someone wants to add any new event or property, it needs to be approved at the weekly meeting of the company’s data governance board before it can be implemented. As a result, they have a clean and small taxonomy – only 70 different event types (vs over 1500 for Company A).

Company B has very tightly controlled everything that goes into their analytics system.

When looking through Company B’s data, it is pretty easy to find the event you are looking for. There aren’t that many of them, and they are well named with clear conventions. And there are an extremely limited number of event properties, so figuring out what to segment by is fairly straightforward. For newer users, it is a lot easier to get oriented with Company B’s data.

However, there are a number of drawbacks. A lot of effort goes into the curation process. At a small scale, there aren’t that many events to review, but as more teams use Amplitude, the work grows. Additionally, the strict oversight discourages teams from adding new events and properties, and makes it a lot harder to run a quick test. As a result, there are a lot of things that Company B wants to measure that aren’t instrumented, and the tool isn’t as useful as it could be for them.

Strict oversight discourages teams from adding new events and properties, and makes it a lot harder to run a quick test.

Company B is what we describe as “Data Dictatorship.” They are ruled by an iron fist, and while that encourages an ordered taxonomy, it also discourages creativity and risk taking, and potentially limits some of the benefits of product analytics.

Data Democracy Lies Somewhere in Between Data Anarchy and Dictatorship

Company A and Company B represent the extremes on either end of the spectrum, so what is in the middle? In the middle, we have a kind of data democracy. Typically when we talk about data democracy with Amplitude, we refer to everyone being able to access product data and use it in decision making. But this second type of data democracy involves enabling people to put data into the product analytics system. There are rules, but everyone is empowered to put in the data that they need to be able to understand their users and understand the results of their work.

When we talk about data democracy with Amplitude, we refer to everyone being able to access product data and use it in decision making.

So how does this work in reality? After talking to a lot of our customers, we have found a few who do a great job of data governance, and manage to empower employees to instrument and analyze while also providing ample guidance and support. Let’s look at a fictionalized version that we call Company C. Company C has a well-organized data science team that acts as the data governors for their company.

Every time that a product manager is thinking about building a new feature (yes, that’s before any code has been written), they talk to the data team, who helps them to come up with appropriate metrics to measure the success of that feature. From those metrics, the data team helps the PM and developers to figure out what needs to be measured, and can help to write up an instrumentation spec, which can then be built by the developers.

The developers then implement the instrumentation, staying in close contact with the data team, who may even read the code to make sure that everything is instrumented according to plan. They can then help the product teams to analyze the data they have put into their product analytics system. As time goes on, the data team will look through existing events and properties to figure out which ones aren’t being used and can be hidden or deleted.

The Advantage of Data Democracy

So how is Company C different from Data Dictatorships or Information Anarchy? Company C encourages employees to log events into their analytics system, which differs from the data dictatorship, where all decisions are made by a central committee who decides exactly what will be logged. Furthermore, the data governors at Company C are typically embedded in a particular product team, or at least are in very close communication with the product team via Slack and code review comments. Finally, the goal of the data governors at Company C isn’t to dictate policy, but rather to empower product teams to better learn from their feature launches and experiments.

Company C encourages employees to log events into their analytics system.

ref The goal of the data governors at Company C isn’t to dictate policy, but rather to empower product teams to better learn from their feature launches and experiments.

When I look at companies facing either Data Dictatorship and Information Anarchy, I see that with some time and experience, they could eventually move towards the middle of the spectrum, and become a lot more like Company C. But that requires them to undergo a fairly sizeable mental shift, and to truly embrace the value of a data governor. A lot of companies think that having data governance imposes a lot of unnecessary weight to the instrumentation process. If done properly, the effect should be fairly minimal, and will be outweighed by the benefits of having much better product analytics data.

About the Author
Image of Dana Levine
Dana Levine
Former Product Manager, Amplitude
Dana is a former Product Manager at Amplitude. He enjoys building products that make peoples’ lives easier, and using data to answer interesting questions. He has started two companies, worked at a wide variety of startups and larger corporations, and studied Neuroscience and Bioengineering at MIT.
More Best Practices