Thinking About Building Your Own Analytics? Don’t

Your choice of analytics platform, whether you’re building in-house or buying third-party, has huge ripple effects on how your company operates, and ultimately on your company’s success.

Perspectives
April 21, 2016
Image of Alicia Shiu
Alicia Shiu
Growth Product Manager
Thinking About Building Your Own Analytics? Don’t

One big decision in the life of any company, especially tech companies who have really great engineering teams, is whether they should buy an analytics solution or build it themselves. In some cases the fervor to build it all yourself seems almost religious — but there are actually many upsides to using a third-party analytics provider for your infrastructure.

Although it may not seem like a big deal, especially early on, the “build or buy” decision has an enormous impact on your company’s productivity and speed, and ultimately on your ability to drive growth.

I can understand the appeal of building your own analytics solution — it will be perfectly customized to fit your needs, and you’ll have complete control over how your data is handled. Some tech giants like Airbnb, Zynga, and Facebook have built impressive data infrastructures, so why shouldn’t you?

Unfortunately, many companies overlook the true costs of building it themselves (and I’m not just talking dollars). If you’re considering or are in the process of building out your own analytics, make sure you think about the ways this could potentially hurt your company.

Building analytics diverts resources from your core product

If you’re spending your time thinking about how to set up your event data pipeline, that means you’re not thinking about your user behavior data and how it can inform your product roadmap.

Companies often underestimate the time and resources that go into building analytics. They think they can throw all of their data in Hadoop, or maybe Redshift, and be done with it. However, if you want to ensure that anyone at your company can access the data they need (more in the next section on why this is so critical), it will take much more time, or perhaps money buying third-party visualization tools, to make that infrastructure usable.

On the other hand, paying for an analytics platform means that your data will automatically be self-service to anyone who needs it, allowing more time and resources to be dedicated to your actual product.

Fareed Mosavat, who leads the growth team at Instacart, faced this same build vs. buy decision when he joined the company. At the time, Instacart was using a hodgepodge of self-made tools for tracking and analysis, but it was difficult for data end-users (like the growth and marketing teams) to access this data themselves. In addition, they were reaching the point where their system wasn’t going to work for their growing data volume, and would require significantly more investment on their part to scale.

Fareed decided to go with Amplitude for their app analytics platform, saying, “I’m much more interested solving the core product problems than building technical infrastructure for analytics.”

Without accessible data, your company won’t have a data-informed culture

In addition to taking away focus and resources from improving your product, building analytics makes it incredibly difficult to make data accessible to the rest of your company.

Accessibility means that anyone at your company (meaning yes, even someone who doesn’t know any SQL), can explore your data and answer their own questions, quickly. A big part of this is data visualization, but static dashboards are not enough — ideally, the end user should be able to slice this data in different ways and discover insights on the fly. Ben Porterfield, co-founder of Looker, puts it well: “The right analytics infrastructure is one that makes it just as easy to share insightful data visualizations (graphs, charts) as it is to dig down into the most granular details.”

You might be wondering why data accessibility is so important. If your company has an analyst or data science team who can write SQL for the product and marketing teams when they need answers, isn’t that enough?

Fareed from Instacart puts it perfectly: “If you say you’re data-driven but everything has to go through an analyst, you’re not actually data-driven.”

If data is siloed to the analytics or data team, then everyone who needs an answer (even something simple, like your daily active users segmented by city), is going to have to go through an analyst. This creates bottlenecks and longer turnaround times, meaning that (1) your data scientists are backlogged with requests and thus less productive, and (2) your entire company loses momentum. Which brings us to our next point…

Your company loses speed

Many high-growth companies operate on the principle that speed is the defining characteristic separating successful companies from the rest of the pack. If you want to operate as quickly as possible, self-service data access is a MUST.

Self-service data eliminates any bottleneck between the data and the end users, shortening the time to insight. This determines how quickly you can move through the cycle of product iterations and improvements, and ultimately how quickly your company can grow.

Now, that’s not to say that raw data access and writing SQL don’t have their place. It’s about enabling product and marketing to answer 90% of the questions they have themselves in an analytics platform. The really exciting, complex stuff — the other 10% — is still incredibly important, and it’s where your data scientists can shine.

Your data scientists waste time on “janitor work”

Choosing to build analytics means that your data scientists and engineers do more grunt work, and less core data science work. As Porterfield says, “Data teams too often create bottlenecks for the rest of the company. IT shouldn’t be doing the work of librarians, retrieving and interpreting data for those requesting it.”

Choosing to build analytics means that your data scientists do more grunt work, and less data science work.

In addition, engineers will need to spend time building and maintaining the data pipeline infrastructure over time. All of that data “janitor work” is a big time suck; in fact, data munging takes up 50-80% of data scientists’ time.

Instead, imagine a scenario where the data pipeline and infrastructure is completely taken care of by a third party. No one has to spend time or money to constantly maintain it, and on top of that, everyone can answer their own questions. Think about how much time your data scientists now have to focus on complex insights and problems.

Still thinking about it?

Whether you’re evaluating an in-house or third-party platform, here are a few questions to ask yourself:

  1. Who in your company needs access to this data?
  2. For those people, does this system make it easy for them to explore the data ad-hoc, visualize the data, and develop their own insights?
  3. (This one is specifically for evaluating third-party systems). Can you access the raw data, and in what format? Does it require running custom scripts and data cleanup to extract this data yourself and get it in a query-ready format, or is there an option for data pipeline and warehousing?

An alternative to building in-house

If you’re considering a third-party analytics solution, you probably already know that there are a ton to choose from. Many of you may be familiar with self-service analytics tools like Google Analytics and Mixpanel. While these meet the basic needs of the non-technical end user, you can’t dig very deep, which means additional custom analysis will be required. However, actually getting the raw data out to do that analysis is a huge pain; you’ll either need to devote resources to extracting and transforming this data, or collect the same data separately into a Redshift cluster — neither of which is an ideal setup. You deserve better.

Unlike other analytics solutions, Amplitude combines a flexible, intuitive analytics platform with raw SQL access via Redshift. This gives you the flexibility to work with the data however you want. Your marketing and product teams can use the Amplitude interface to answer the vast majority of their questions and track their core product metrics. At the same time, you get instant access to your data in a dedicated Redshift cluster that we maintain – all you have to do is log in. You can connect any SQL editor or visualization tool on top of your cluster, like Wagon, Looker, or Mode.

Create a culture of data

I don’t think I need to convince anyone on the importance of making sound business decisions based off data insights. But is your company currently set up to do that? Does everyone have access to data? Can they understand what the data means for their own goals as well as the company’s core metrics?

Your choice of analytics platform, whether you’re building in-house or buying third-party, has huge ripple effects on how your company operates, and ultimately on your company’s success.

About the Author
Image of Alicia Shiu
Alicia Shiu
Growth Product Manager
Alicia is a Growth Product Manager at Amplitude, where she works on projects and experiments spanning top of funnel, website optimization, and the new user experience. Prior to Amplitude, she worked on biomedical & neuroscience research (running very different experiments) at Stanford.