Why We Didn’t Build Auto-tracking for Amplitude

Auto-tracking doesn’t eliminate work. It shifts work to a less scalable process.

Inside Amplitude
August 3, 2020
Image of Jeffrey Wang
Jeffrey Wang
Co-founder & Chief Architect
Why We Didn’t Build Auto-tracking for Amplitude Large

Implementing great product analytics requires product and engineering teams to work together. They align on target product outcomes, define an event taxonomy to measure those outcomes, and instrument tracking code. As a result, engineering tracks only the data that matters, and their work is tied directly to business results. More importantly, the shared source of truth between product and engineering teams lays a foundation for a data-driven culture.

New customers often ask us about an alternative approach, auto-tracking. They’re referring to an automated event collection process that some other analytics platforms hang their hat on. The idea is simple: Insert a code snippet at the top of your application, and we’ll capture all your customers’ events, with no engineering lift.

If it sounds too good to be true, that’s because it is. We actually built our own prototype of auto-tracking, but scrapped the project once we learned that it led to more work for system admins, less meaningful analysis, and inevitable security risks. In this article we’ll tell you exactly why.

Auto-tracking Doesn’t Eliminate Work, It Shifts Work to a Less Scalable Process

Once installed, auto-tracking SDKs collect events from page views, clicks, and object interactions and assign them generic names like page viewed, button clicked, or form submitted. This bypasses tagging events in the codebase, but an admin still has to sort through every component that’s released, rename events that are relevant to their team, and block the rest.

This isn’t just a one-time task. It’s a consistent responsibility that grows in scope the more you iterate on your product. So while auto-tracking might save a few hours of initial work for engineers (though we’ll explain later why that’s often not the case), it creates an ever-expanding data cleanup project for your admin.

To illustrate what this work looks like, here’s a note our team received from a B2B product manager about using an auto-tracking system:

“Every Friday around 4:00 p.m., I’d finish those last few important emails, grab a beer from the office kegerator, and spend the remainder of my day tagging pages. Every time a new product development rolled out or a user interacted with a new part of the product, new pages needed to be manually tagged so it could be recognized. One of the most daunting parts, though, couldn’t be helped by that beer I was sipping. It was knowing that I’d never finish. The most I could do was constantly chip away.”

Auto-tracking Misses the Full Story of What’s Happening in Your Product

Good product teams don’t just ask, what are my users doing? They ask, why? This requires context, captured through user and event properties like platform, plan type, and experiment version. A lot of the most valuable and relevant information actually exists in properties rather than events. Without this nuanced understanding, you wind up building one experience for the median user, rather than compelling experiences for each user informed by their behaviors and preferences.

Properties are not auto-tracked, so if you want to capture them you’ll need engineering support that you sought to avoid in the first place. On top of that, the divergence in the code between auto-track and managing the properties creates even more complexity, whereas the most successful implementations we see have engineering create a lightweight framework for analytics tracking that makes adding and changing things easy. With property tagging and weekly admin maintenance considered, the time-to-value argument for auto-tracking starts to fall apart.

Auto-tracking Breaks, Creating More Work, False Assumptions, and Data Distrust

For apps running experiments or personalization, or that are built on dynamic libraries like React, event tracking breakage is a guarantee, unless your class and ID framework is built specifically for auto-tracking. If you thought event-tagging was a hard sell to your engineering team, try pitching them the idea of the product team relying on the consistency of their CSS.

The risks aren’t limited to React products—when used on anything beyond a simple web app that’s rarely updated, auto-tracking tends to break. This is because auto-track solutions are built for marketing websites, which are much more static than products. In fact, auto-track providers specifically recommend you have engineers instrument all mission-critical events like successful signups and completed order transactions to make sure your metrics are accurate.

Many teams ignore these warnings and solely implement auto-track. Eventually a code change breaks tracking, critical metrics drop off, and admins are tasked with combing through a heap of uncategorized events to pick up the trail. Even worse, it’s possible that the breakage goes unnoticed, and product decisions are made under completely false pretenses.

When breakage, chart corrections, and event name changes become the norm, doubt creeps into the minds of business users—how can I follow my curiosity, move fast, and make product bets if the underlying data might be off? Once you’ve lost data trust, it’s incredibly difficult to earn back.

Auto-track Caused Security Incidents Where It Accidentally Captured Sensitive Information

The security and privacy of customer data is of utmost importance to Amplitude, and it’s also becoming more and more important to end users. That’s why we invest heavily to develop industry best practices for implementation, and platform capabilities that make sure our customers ingest clean, actionable event data, exactly as they intended.

With auto-tracking, there’s no way to meet this standard for data security and privacy. Just google, “autotrack passwords.”


Learn more about implementing Amplitude at your organization.

About the Author
Image of Jeffrey Wang
Jeffrey Wang
Co-founder & Chief Architect
Jeffrey owns the infrastructure that enables us to scan billions of events every second. He studied Computer Science at Stanford and brings experience building infrastructure from Palantir and Sumo Logic.