What many leaders don’t realize when making the shift to a data-driven model is that it’s about so much more than gathering customer data. A strong data governance program is at the heart of any mature data operation that returns useful insights.
As VP of Enterprise Data and Insights at Amplitude, I think a lot about getting the most out of our data. My job is to provide the right information at the right time so leaders can make strategic business decisions. It’s my responsibility to surface relevant data and ensure it’s complete and trustworthy. That means I spend a lot of time thinking about the best ways to govern our data.
Even if your data program isn’t as mature as ours, these issues remain relevant. Data governance is foundational to any data-driven initiative. It’s how companies get ahead and stay ahead—whether by building trust with customers or avoiding the cost of violating regulations.
Here’s how I think about data governance and some points to consider when evaluating your governance framework.
- Data governance increases the value of a company’s data while helping manage risk.
- Amplitude’s data governance framework has four pillars: data security, compliance, data quality, and transparency.
- A framework built on these (or similar) pillars, plus market awareness, helps companies stay proactive in their data management practices.
- Amplitude comes with tools to help customers boost their own governance practices.
Understanding data governance
is central to the push to become data-driven because complete, validated data is a company asset. If you can’t trust your data, if your practices are opening the company up to risk, or if it’s too hard to make sense of the information, you won’t be able to use your data strategically.
Your governance efforts help you make sense of the information you have by standardizing your organization’s collection and treatment of data. They outline policies regarding risk management and define individual employees’ responsibilities regarding data. Governance also includes monitoring data quality and addressing problems within your data lake.
Governance builds ROI by enabling individuals to make data-driven decisions. And then, of course, there’s the flip side: Data compliance issues can be costly in terms of financial penalties and customer opinion. Challenges like implementing the GDPR’s right to be forgotten or dealing with the outcome of a phishing attack are easier to navigate if you have rules around data use, storage, and access.
Amplitude’s approach to data governance
At Amplitude, we consider four pillars foundational to effective data governance: compliance, security, data quality, and transparency. These components help us achieve efficiency and effective risk management in our data-driven efforts.
Pillar 1: Data security
Data security is a big part of compliance efforts, so it cannot be an afterthought. Of course, you will want to have basic security practices like encryption in place. But on top of that, Amplitude has the philosophy of “least privileged access.” No one gets access to data if they can’t prove they need it. That protection helps if you have a phishing problem, for instance, or if you were to see an ethical lapse by a staff member.
Determining what data is available at which access levels can take a lot of work, especially if you have a lot of information. We rely heavily on automation to help us. We use data stored in Snowflake and machine learning to determine whether data is personally identifiable information (PII) or sensitive personal information (SPI) because those types of data are more restricted.
Sometimes, we get false positives, so we have to have a human element as well. But it takes on most of the work, and after everything is correctly classified, it’s easy to assign access.
Pillar 2: Compliance
Now, let’s talk about compliance and our starting place of knowing where all our data exists. At Amplitude, we like to talk about the “life journey” of our data. Where is it coming from, and where is it going as it moves through our systems?
Being able to follow and control data is crucial for complying with regulations because if you don’t know where your data is, you cannot keep it secure or delete it when it needs to be. Often, data is duplicated in tertiary or secondary systems, where it was copied and used for analytics or similar purposes.
We also want to anonymize data when possible, so we don’t look at individuals when we do our analytics. We’re looking at aggregated data that we couldn’t unwind into anything identifying.
This is another proactive step we take, and it’s another one that takes a lot of time and energy. But is it worth it? When you can get fined a percentage of your revenue for violations, yes, it absolutely is.
With our efforts, Amplitude complies with newer regulations like GDPR and CCPA, plus older ones like HIPAA. We meet standards for SOC2 Type 2 and are compliant with PrivacyShield. We’re ISO 27001, ISO 27017, and ISO 27018 certified.
Pillar 3: Data quality
Once you get your fundamentals for compliance out of the way, it’s time to think about the usability of your data. That’s where data quality comes in. It’s a day-to-day battle, and I’m sure any data engineer will understand what I mean by that because humans can only do so much.
At Amplitude, a couple of processes focus on the key critical data elements. The first is anomaly detection. We use and other tools to help us spot things like: Did we process the same number of records we started with? Are our checksums validated?
Second, we do data profiling as needed when we see significant changes in the volume of data. We use it to get an overview that helps us understand if we need to amend or complement our existing data quality rules.
Pillar 4: Transparency and control
We pride ourselves on being secure and trustworthy. Part of that trust comes from our work to comply with various standards and help customers meet their privacy needs. Another part is communicating these efforts, so customers know what we are promising and can hold us to it.
We have a to share this information freely and help our customers see every step we’ve taken to secure our data and systems. We are also open about our . When customers know their options and can easily take control of their data, you are doing well with your transparency efforts.
Continuous improvement and monitoring
There’s no start and finish to your data governance efforts. But if security and data governance are part of your DNA like ours, it helps you protect yourself and adapt when rules change.
For instance, our internal policies are as stringent as possible without hampering our data-driven efforts. So when new laws and regulations come out, we don’t have to build new guardrails or protections from scratch. We must coordinate with our legal and compliance teams to set appropriate rules and policies.
Outside of regulatory changes, we want to set new best practices before the law tells us to. I am always asking: Where is the market trending? And how is technology progressing to accommodate that? I’ve already discussed how machine-learning tools are central to our governance efforts. We like to look for ways to apply new tech to help us adapt as a company.
We have an internal data governance council and a monthly check-in with compliance teams to ensure we know what’s on the horizon and can plan accordingly. We must remember that it is always a journey, not a project.
Partnering with customers for better data governance
Since we help people become more data-driven, we are always thinking about how to help our customers with data governance concerns. What part of our culture can we export to make it easier for them?
Sharing governance best practices
We try to share our expertise wherever we can. For instance, we publish a lot about . Our team has covered issues like and . We have these discussions internally, so why not share our insights externally, too?
Training on Amplitude’s governance tools
We build essential governance tools straight into Amplitude. Our goal is to ensure every customer can quickly get started within our platform.
We also offer more advanced tools like , which were recently added to our . Last year, we launched Amplitude’s , which can automate some data governance practices.
Building new data management tools
One of our big efforts right now is our data cataloging tool. We want to integrate data with the visual layer of Amplitude—so when our customers click on a metric in their visualizations, they can see the definition and the owner of that entry in the data catalog. We also want to show the transformations happening, so people can see how the data got to the point where it’s being used this way.
That transparency enables people to check in on their own data usage. It’s easy to contact the data owner to learn how something is being calculated or ask about the data sets being used. And then we’re looking to integrate that into Jira, so it fits easily within existing workflows. It’s a big project. But that’s what transparency looks like today.
Data governance and you
I hope this post has helped you understand why data governance is important and given you some ideas on how to start or improve your own internal program. The need for data governance is not going away. As companies gather petabytes or even zettabytes of data, governance will keep that data organized and useful.
For us data governance professionals, the big question is how we can make these processes easy enough to produce value and protect against security and compliance aspects as well. There’s no universal solution. It’s a constant balancing act. But the more you dig in and see what you can do, the closer you will be to maximizing your company's data potential.
Ready to take your data governance efforts to the next level? Download our , which will take you through step-by-step exercises to build a successful data governance framework.