In 2012, a data competition on Kaggle tasked participating teams with identifying what factors contributed to a car being a “lemon,” or a bad purchase.
But it was the factor that most connected with car reliability that was really surprising. Orange cars, the Kaggle teams found, were universally less likely to have after-purchase problems. Kaggle’s founder stepped in to provide analysis: “Orange is an unusual color,” he said, “You’re probably going to be someone who really cares about the car [if you bought orange] and so you looked after it better than somebody who bought a silver car.”
It sounds absurd, but all of it came straight out of a massive data-set analyzed by hundreds of scientists and coders.
There are all kinds of strange phenomena just like this out there. There are apps that soar in popularity for a day, then tumble back down into obscurity. There are sudden bursts when your home page converts at a rate of 20%, then goes back down to 15% with no explanation.
But the only way to harness these phenomena to improve is to use analytics. When you witness something unexpected and you’re able to go back and understand it with your data, what you’re doing is bumping up against the limits of your idea of the world—and pushing it a little further. You’re learning. You’re making progress.
At Amplitude, helping startups make progress and learn about the mechanics of their own apps is what drives us. We love hearing stories about how analytics helped uncover the unexpected cause of some strange phenomenon.
That’s why we recently asked some of our data-driven startup friends for their own personal “orange car” stories. We asked for the odd, the out-there, the realizations that turned businesses around, or the discoveries that overturned long-held beliefs.
Here they are—four plain surprising stories from four startup data teams.
When should a young company start looking at its analytics? If Donald Trump is any indication, they should start early and often.
Well into 2016, virtually the entire media establishment was in agreement on the model they were using to understand the GOP race. In it, Donald Trump was a non-starter. He had no endorsements and no support from the traditional players in the Republican party. Even 538’s Nate Silver, the once-whiz kid who correctly predicted 50 out of 50 states in the 2012 general election, wrote him off.
What went wrong was a classic Bayesian misstep. Instead of updating their priors to account for new data, the pundits and commentators clung to them. They saw the contrary evidence; they just chose not to give it weight. What’s more surprising is that not even Nate Silver, the modern-day king of Bayes, was immune.
“Numbers have no way of speaking for themselves,” as he warned in his book The Signal and the Noise, “Data-driven predictions can succeed—and they can fail. It is when we deny our role in the process that the odds of failure rise.”
Analytics is not all about having all the data. It’s not about building the right models—or priors—because you’re not going to guess correctly every time. It’s about being flexible with your priors in the face of new evidence.
If we can learn anything from Trump’s rise, it’s that you’re better off getting a grip on your data early, being fluid about what it means, and revising your priors before they have a chance to dig in.
Otherwise, you’ll get blindsided—just like America did.
One big decision in the life of any company, especially tech companies who have really great engineering teams, is whether they should buy an analytics solution or build it themselves. In some cases the fervor to build it all yourself seems almost religious — but there are actually many upsides to using a third-party analytics provider for your infrastructure.
Although it may not seem like a big deal, especially early on, the “build or buy” decision has an enormous impact on your company’s productivity and speed, and ultimately on your ability to drive growth.
I can understand the appeal of building your own analytics solution — it will be perfectly customized to fit your needs, and you’ll have complete control over how your data is handled. Some tech giants like Airbnb, Zynga, and Facebook have built impressive data infrastructures, so why shouldn’t you?
Unfortunately, many companies overlook the true costs of building it themselves (and I’m not just talking dollars). If you’re considering or are in the process of building out your own analytics, make sure you think about the ways this could potentially hurt your company.
Instacart is an on-demand service that delivers groceries in as little as an hour from local stores. Users can place orders on the website, iOS, or Android app. With these different user experiences, and many users moving between platforms, Instacart needed cross-platform tracking for measuring and understanding user behavior across web, mobile web, and native mobile apps.
Ask your average startup team whether data is exciting, and you’re likely to hear some enthusiastic “Yes!” votes.
But ask that same team whether data is creative, and you’re likely to get a bunch of confused looks.
This is the conventional view of data: capable of exposing us to exciting insights, maybe, but not actually creative—not generative.
But that view couldn’t be further from the truth. Metrics are like a sculptor’s marble when put in the hands of a truly data-driven team: they’re raw materials from which something new can be created.
One of the crowning achievements of modern television, in fact, was the brainchild of one of the most heralded analytics teams in the world: Netflix.
When we talk about data accessibility, most of the time it’s in the “all for one, one for all” context. Everyone who wants access to the data to draw insights from them, can do so easily.
But we can also look at data accessibility in another way–accessibility in the sense of collecting all the data you want and having your whole company running analyses in real-time.
Leading growth product managers agree on one thing: set up your analytics as soon as you have users. But if you’re at an early-stage company, using up valuable engineering resources to build a robust in-house solution that allows easy data access for all may not be feasible. For the short-term, you can make do with a minimum viable analytics stack, gathering data through Google Analytics reports or by running scripts on a MySQL database.
But what happens at that inflection point when your company experiences hyper-growth? When your active user count begins to grow exponentially?
More likely than not, you’ll reach the limits of your homegrown analytics infrastructure. Things will start to break and then you’re spending all your time putting out fires instead of building your product.
To create a data-informed culture early on in your company, it’s well worth the time and investment into a self-service analytics solution that allows data access for all. That doesn’t just mean a platform with an intuitive UI that everyone can use, it means looking at a solution from the lens of scalability.
Imagine you’re a member of the growth team of a promising (as all Silicon Valley startups are) social media app. It’s a late evening in SoMa. You have your email open. You keep hitting ‘Refresh,’ waiting for the one piece of information you need to finish your presentation to your CEO tomorrow. You want to know something seemingly straightforward: How many daily active users did the app have over the past 6 months in San Francisco versus San Jose? You asked the data science team for this information 3 days ago; they promised to get that information to you EOD. When the data finally does come in (indeed minutes before midnight), it’s rows and rows of numbers in a huge Excel spreadsheet, which you’ll be spending the next several hours turning into graphs to make sense of it.
This, despite popular belief, is not how data-informed companies function. Fareed Mosavat, Consumer Growth Product Manager of the popular grocery delivery app Instacart puts it best: “If you say you’re data-driven but everything has to go through an analyst, you’re not actually data-driven.”
We’ve mentioned before that data accessibility is one of the core tenets of being data-informed. In a company with a pervasive data culture, everyone has direct access to user data and can answer their own analytics questions. No more back and forth or bottlenecks to deal with. Instead of bugging the data science team, you’d be able to get the information you need yourself, in seconds.
In that sense, data accessibility truly unlocks your org’s potential. It allows you to:
- Move quickly.
- Focus on the product.
- Build a data-informed culture.