Stop Treating Your Data Scientist Like a Janitor

A recent survey revealed that data scientists spend 60% of their time data munging.

July 19, 2016
Image of Archana Madhavan
Archana Madhavan
Instructional Designer
Stop Treating Your Data Scientist Like a Janitor

So you just hired someone to fill an opening for the sexiest job of 2016, a data scientist at your up-and-coming Silicon Valley startup. What if, a few months down the line, you realize your data scientists are actually spending the majority of their time working on something decidedly… unsexy?

Data cleanups aren’t fun

Source: CrowdFlower


A recent survey revealed that data scientists spend 60% of their time data munging, a very unglamorous term that describes the process of cleaning up data and preparing it for analysis.

Unfortunately for data scientists, real world data is messy. “At best it’s inconsistently delimited or packed into an unnecessarily complex XML schema. At worst, it’s a series of scraped HTML pages or a thoroughly undocumented fixed-width format,” writes Michael Driscoll, a data scientist who popularized the term ‘data munging’ as as one of the 3 “sexy” skills of data geeks.

While data munging is an extremely useful and critical skill to have, most data scientists, many of whom have spent years learning complex math and machine learning, and have earned PhDs, do not want to spend time as data janitors.

In fact, 57% find data cleanup to be the least enjoyable part of their job. Source: CrowdFlower


What happens when your data team spends the majority of their time doing their least favorite thing? You have a bunch of highly educated, extremely disgruntled employees. This also means you’re not getting value out of some of the smartest people at your company. You’re not, by extension, really getting the full value out of your data either.

Data requests are tedious

A former data scientist at a popular mobile gaming company confided in me once, “Data scientists do too much analyst work, like counting things. I hated being asked to pull lists of emails, especially when I had other things to do. The phrase ‘SQL monkey’ comes to mind.”

At companies where data is siloed to the analytics or data team, anyone who has a question about user data is forced to go through a data scientist. This causes bottlenecks–your data scientists have to deal with a backlog of requests and, again, are prevented from doing their actual job. More significantly, your entire company slows down.

Accessible data for all

Bottom line – don’t treat your data scientists like janitors or monkeys.

End-to-end, out of the box analytics solutions (like Amplitude) collect, process, and store data, so your company has one central source of truth. The data cleansing and organization happens behind the scenes. Non-technical end users and analysts have the ability to query the data, track their own metrics, and discover insights using an user-friendly interface. And ultimately, this means your data scientists have the time and resources to do what they really love (and the job that you hired them for): actual data science.

About the Author
Image of Archana Madhavan
Archana Madhavan
Instructional Designer
Archana is an Instructional Designer on the Customer Education team at Amplitude. She develops educational content and courses to help Amplitude users better analyze their customer data to build better products.