Who Should Own Data Governance?
Stefania Olafsdottir is CEO at Avo, a collaborative team platform for navigating event tracking and guaranteeing accurate analytics. She kindly joined us to tackle the question of who should own data governance within a business and outlines what ownership can look like across different teams.
“Tools can support cultural shifts, but it's really important that we don't think about tools as a magic solution to cultural shifts.”
A data-led career
Stefania has worked in Data Science for over a decade, starting at a genetics company as a data engineer. Here she managed distributed computations and correlation analysis of physical trade with DNA mutations and discoveries.
Her PhD trajectory took her to QuizzApp, which reached one million users in the first five days, which was record-breaking growth for the App Store at the time. While there, she had the opportunity to build the data science division, team, infrastructure, and culture. “I had to learn a lot about data reliability and quality, including who should own what,” says Stefania.
Stefania needed to build tools and processes to facilitate better accessibility of self-serve analytics and even self-serve governance.
A few years later, she used this experience start Avo, a company that fixes data quality for product analytics. Avo has helped well-known companies like Fender, IKEA, and Condé Nast, and others to build better data cultures.
The stages of data governance
In my experience, organizations typically transition through four stages of data quality and data governance:
- Wing it
- Wild West
- Centralized Governance
- Self-serve Governance
We often start by winging it in the early stage, where there is no data because the product is just starting and there’s no real tracking element.
We then move into the ‘wild west’ stage, where it’s every team (or platform) for themselves. At this stage, we have a lot of data and variants, but the data isn’t typically owned by any one team, it’s generally bad data.
The next stage we go into is centralized governance. We typically have a person or team that owns data governance as the advocate for data quality. Usually, at this stage, the advocate will uncover the lack of data standards and take action. But soon the centralized team becomes a bottleneck because it’s a lot of work for them to make sure everything is high quality.
This leads us to the next step, which is self-serve governance. We begin to see global standards become owned by domain experts.
What makes data governance difficult?
Data governance is a cross-functional problem. We have three core role groups contributing to data quality and relevance:
- Product developers
- Developers and product engineers
- Data professionals
The product developers and engineers understand what goals to set for a product. They know the relevant metrics to measure success and the steps to take to achieve goals.
Data professionals have insights into what data is available, what tools to use for metrics analysis, and what structure the data should be. They also know how new data can fit with existing data.
You have all these different stakeholders bringing these different things to the table, and they all need to be included. It should be a cultural aspect that all teams are involved in, but they typically have different problems they want to solve at different stages.
For the best outcomes, it’s best to solve problems cross-functionally at the source of when events and data are created. This will increase data readability downstream, data quality, and data reliability.
Avoiding the wild west of data governance
It’s possible to build quality data from the start, providing you have some experience of knowing what is and isn’t good in advance.
If someone handling data has seen where it has gone wrong before, they can use that knowledge as an opportunity to try it in a new way at the start.
You need to commit to a naming convention and create global standards, including for the concepts within your product. Processes are also important. They serve as a single source of truth for how teams should document schemas.
If you’re starting with a new product, your tracking plan will be different now from what it will be in three years. There’s so much product evolution needed before you can firmly know how you will measure your product. Use the early stages to determine the standards, and add some best practices, but accept that it’s something you’ll need to iterate on along the way.
“Collaborating cross-functionally has a flywheel effect—better breeds better.”
How to navigate cross-functional navigation
Collaborating cross-functionally has a flywheel effect—better breeds better.
If the product, engineering, and data teams care about data, everyone is invested in shipping quality data. Teams can then make better decisions and improve user experience. It works as a cycle that reinforces itself and becomes contagious.
I recommend starting with a small team. Find those data-curious developers and product managers to work with ahead of your next minor feature release and do a meeting on the purpose of that feature and what metrics you’re looking to drive.
That meeting will determine what your tracking needs are to assess the success of the product feature. For this meeting to have as much value as possible, it should occur before product development starts so developers don’t need to refactor code to bake in tracking.
Making tracking a first-class citizen in your product development process and involving everyone has the benefit of creating a more positive culture towards data. This alleviates the feeling of data governance being a chore, and rallying everyone around the goal of shipping better products.
Moving towards self-serve governance
To move towards a self-serve governance model, you need to combine two approaches: forcing teams to behave how you need them to and inspiring them to, as well.
If you remember the flywheel approach above, I suggest using teams almost as case studies for how easy it can be to manage analytics if you do it cross-functionally and proactively instead of reactively.
You also need to consider how to enforce standards computationally. Perhaps you introduce a JSON schema and then train people to implement suggestions to your company's single source of truth. Consider what global governance can happen on a computational level and the standards to propose so you can inspire culture.
Eventually, you will have enough case study examples of success, allowing you to make a case to senior leadership or data leads for why these examples should be implemented company-wide.
Audience Questions
Recommended event structures
With regards to choosing a more generic based event structure versus a more specific one, it’s tricky to choose one. It shouldn’t be too generic or specific.
It ultimately boils down to how many objects can be created and what the most helpful way to segment them is. If it’s helpful to segment the object created by its type, having it as a single event might be valuable. Always consider how it will be used to build charts.
Remember, your tracking and product will look different in three years from how it does today, so consider how you can design something future-proof without overthinking it upfront.
Creating branches in projects
We introduced branches to our analytics release workflow in 2018. That was a “wow” moment for people.
You should tie your branches and technical management to development branches. If you’re in a branch and using any validation in code generation or observability, you get validation for what’s happening in development on that branch. When you reach the main, you get validation for what’s happening in production.
Join the community!
Connect with Amplitude team members and other practitioners in our community. The Cohort Community exists to help people become better analysts. We focus on actionable programs, sharing best practices, and connecting our members with peers and mentors who share in the same struggles.