Data Governance: Lessons From the Field
Four brave Ampliteers step forward and share their stories of data mishaps inside our Amplitude setup and across a range of customer use cases. Learn what they discovered and how they overcame these data governance woes. Plus, get bonus tips on how Amplitude’s AI features remove the need for repetitive, event data cleanup work.
“Data governance is a lot of things, but essentially it's a way of setting up rules and processes to label, organize, safeguard, and ensure high quality and useful data.”
Learn key elements for improved data governance
- The importance of data governance
- How Amplitude navigates data
- Why data governance is crucial for go-to-market teams
- What is Project Champagne
- Amplitude’s advanced Data Assistant
Why data governance?
The main goal of data governance is to break down data silos and build trust. People lose trust in data quickly, especially those who are less comfortable with data. Data governance is a way to ensure the most relevant and consistent data is produced for the benefit of every team across every metric.
Data governance can mean a lot of things. In simplest terms, it sets up rules and processes to label, organize, safeguard, and ensure high-quality and useful data.
Data governance issues pop up in a variety of ways, from similar data sources showing different outputs, to incorrect messages being sent to customers due to data targeting inaccuracies, or even just synonymous events that seem redundant or confusing.
It’s a continuous process, and should be practiced iteratively. While data governance is typically owned by one person or a central team, it’s something everyone can contribute to.
How we handle Data Governance at Amplitude
Every company handles data governance differently.
At Amplitude, we build data trust through real-time monitors, creating data accountability, driving data cleanup initiatives, and building template dashboards.
We leverage custom monitors to build awareness of data anomalies. When we see an unexpected spike or dip in product data, it impacts downstream reporting and must be addressed quickly.
As you’d imagine, go-to-market (GTM) teams at Amplitude make extensive use of data throughout their day-to-day workflows. As more stakeholders utilize data, the more accountable our product teams need to be in driving data consistency.
For example, the sales teams use data from our freemium starter product to identify active prospects. They search for those hitting paywalls and limits who are likely ready to explore a paid plan.
Customer success uses data to acknowledge customer pain points and adoption trends to support users in making data-backed decisions.
Marketing also handles data to build product usage cohorts for lifecycle marketing emails and in-app notifications. They also measure marketing channel attribution conversions and activation set-up rates.
The more the teams rely on production data, the more alignment there is to surface data discrepancies and feedback along the way.
To stay on top of this within our own Amplitude set up, we established an internal team through Project Champagne. This team focuses on the adoption of our own product by building template dashboards for our go-to-market teams. It drives data cleanup initiatives and handles ongoing data governance and maintenance, too.
Monitoring data
At Amplitude, we maximize our usage of custom monitors and the webhook integration with Slack. We use these to flag data ingestion, people process, and pain anomalies—just some example monitors we pay close attention to.
We have three primary categories of anomalies:
- Data ingestion anomalies
- Campaign anomalies
- People process anomalies
Data ingestion and campaign anomalies can be set up directly in your Amplitude, depending on which product features you can access. People process anomalies can be shared upon request by a CSM for a growth enterprise customer.
For example, a data ingestion anomaly could be if events are ingested with an invalid user ID. We’d use that flag to make sure we file a Jira bug for engineering to investigate the code.
Campaign anomalies are things like when our lifecycle marketing team highlights when critical message deliveries drop to zero send-outs. We’ll create an alert with a rolling window to help smooth out common weekend dips in data.
Regarding people process anomalies, these can be issues like employees creating too many projects because they’re testing something (an education issue) or if someone can query sensitive data when they shouldn’t be able to due to incorrect permissions.
Including the whole team
We empower our teams to share data feedback and submit Jira bugs when there’s a gap in instrumentation. For instance, with a Zapier automation, we’ve created a simple self-serve customized Jira form that can be directly submitted through Slack.
Our Project Champagne team liaises between business requesters and product engineers to identify whether there’s a gap in data education or if it requires developers’ support. Often, a new instrumentation request comes through, but it can actually be derived from existing data.
Or if there's a low-lift, no-code solution we can use to service a request, then we'll use that before getting into code change requests.
This is where the AI-suggest feature can be incredibly powerful, too. For example, it can be used to create a new derived property. The more descriptive your prompt, the more accurate the output.
Why we created Project Champagne
One of our core values at Amplitude is to help companies build better products. We already have dedicated customer-facing teams supporting customers in implementing Amplitude. So we wanted to extend this to our own teams, too. In this scenario, we treat ourselves as a customer, taking the same learning, onboarding activities, and best practices and created a small task force with members from our customer success team.
You may know this as dogfooding, but we’re a classy bunch so prefer to think of it as drinking our own champagne, hence Project Champagne.
The Project Champagne team communicates with every team at Amplitude and we lead them through the onboarding stages, including discovery calls, interviewing the teams, and understanding their requirements.
We wanted to empower our customer-facing teams on the go-to-market side, including sales, implementation, and customer success teams, through the data they use.
We started by creating spaces for different teams. Then we use the Amplitude feature of notebooks, which are one-pagers where you can embed charts, links, and other visuals. We use notebooks as our main sharable knowledge document. We also use pre-built templates and dashboards for different teams, who each leverage these and create their own.
Speed vs. Issues
With teams having the freedom to move forward in their way, we did encounter some issues, such as inconsistent data schemas. These include events or property names and values that come through and are not consistent or intuitive.
We also experienced data trust issues, with teams lacking confidence in what the data represents and its accuracy.
Our approach of giving teams more liberty to create and send data means that sometimes test data gets into the wrong places, resulting in more cleanup for the Product Champagne team.
However, initially it helped us to enable quick instrumentation. But going forward, we want to adjust this over time and have stricter guidance and schema rules.
In-product tricks
We recommend starting with automation. The Amplitude UI has options to review taxonomy and gives suggestions to adjust it accordingly.
If some metadata isn’t immediately apparent, you can select the cog icon in the top right side to select additional columns that give you a better story of events created and whether they’re being leveraged.
You can add parameters and filter events by volume, such as excluding those not seen within the last 30 days.
Regarding trusting data, code reviews should be a last resort. A better approach is to audit the data and the tracking to ensure actions occurring within the product are passing into Amplitude.
If your product is browser-based, use the Amplitude Chrome extension to view the live stream of events as you navigate the site to validate the event stream.
Data governance foundations
Through our experiences working with customers, we have found three pillars to be fundamental to data governance:
- Education
- Instrumentation
- Maintenance
Setting the foundation of education enables users to understand Amplitude and their own data, as well as their taxonomy and how things are set up. We have tools within Amplitude to help you build a data dictionary that can further education by serving as a reference point.
Instrumentation considers a standardized process of how you instrument your data in Amplitude. It’s important to enforce syntax, naming conventions, and the style of data that makes sense for your business. Consider what you’re trying to solve and use Amplitude to enforce actionable instrumentation.
The third pillar is maintenance. You must consider how to maintain your data to ensure it’s as clean as possible. Encourage some cadence of regular audits, such as monthly or quarterly, depending on your needs. Identify where things may have gone wrong and work to mitigate those with tracking and monitoring.
Data Assistant
Governance is a core part of Amplitude’s AI strategy, and we have a three-step plan for combining AI and product analytics.
Step one is to ensure data is usable by AI and that the data assistant can improve the data quality. The second step is for the assistant to analyze data and inform you when something needs attention automatically. Step three is to improve your digital product automatically, which will become more possible in the future.
We built our Data Assistant to encode and simplify data governance workflows. It shows if things are improving or need enhancing, allowing you to prioritize recommendations. It also gives you clear overall scores to visualize performance better.
We’re currently working on releasing Auto Accept, which is designed explicitly for accepting all cases when there is robust confidence in the AI.
Our work is now about learning how analysts analyze and pull information backed by accurate charts and dashboards and embedding it into governance. By understanding how we govern and analyze, data becomes more usable, tailoring governance to the analysis flow. And we can put those learnings into AI so that it can help you govern your data.
Meet the Speakers
Alex Simmons
Sr. Customer Success Manager
Jenn Rudin
Sr. Data Strategist
Joe Reeve
Software Engineer Manager
Rox Chang
Sr. Engagement Manager
Join the community!
Connect with Amplitude team members and other practitioners in our community. The Cohort Community exists to help people become better analysts. We focus on actionable programs, sharing best practices, and connecting our members with peers and mentors who share in the same struggles.