Stop Treating Your Data Scientist Like a Janitor

A recent survey revealed that data scientists spend 60% of their time data munging.

July 19, 2016
Instructional Designer
Stop Treating Your Data Scientist Like a Janitor

So you just hired someone to fill an opening for , a data scientist at your up-and-coming Silicon Valley startup. What if, a few months down the line, you realize your data scientists are actually spending the majority of their time working on something decidedly… unsexy?

Data cleanups aren’t fun

Source:

what-data-scientists-spend-most-time-doing

A recent survey revealed that data scientists spend 60% of their time data munging, a very unglamorous term that describes the process of cleaning up data and preparing it for analysis.

Unfortunately for data scientists, real world data is messy. “At best it’s inconsistently delimited or packed into an unnecessarily complex XML schema. At worst, it’s a series of scraped HTML pages or a thoroughly undocumented fixed-width format,” writes Michael Driscoll, a data scientist who popularized the term ‘data munging’ as as one of the .

While data munging is an extremely useful and critical skill to have, most data scientists, many of whom have spent years learning complex math and machine learning, and have earned PhDs, do not want to spend time as .

In fact, 57% find data cleanup to be the least enjoyable part of their job. Source:

least-enjoyable-part-data-science

What happens when your data team spends the majority of their time doing their least favorite thing? You have a bunch of highly educated, extremely disgruntled employees. This also means you’re not getting value out of some of the smartest people at your company. You’re not, by extension, really getting the full value out of your data either.

Data requests are tedious

A former data scientist at a popular mobile gaming company confided in me once, “Data scientists do too much analyst work, like counting things. I hated being asked to pull lists of emails, especially when I had other things to do. The phrase ‘SQL monkey’ comes to mind.”

At companies where data is siloed to the analytics or data team, anyone who has a question about user data is forced to go through a data scientist. This causes bottlenecks–your data scientists have to deal with a backlog of requests and, again, are prevented from doing their actual job. More significantly, your entire company slows down.

Accessible data for all

Bottom line – don’t treat your data scientists like janitors or monkeys.

End-to-end, out of the box analytics solutions (like Amplitude) collect, process, and store data, so your company has . The data cleansing and organization happens behind the scenes. Non-technical end users and analysts have the ability to query the data, track their own metrics, and discover insights using an user-friendly interface. And ultimately, this means your data scientists have the time and resources to do what they really love (and the job that you hired them for): actual data science.

About the Author
Instructional Designer
Archana is an Instructional Designer on the Customer Education team at Amplitude. She develops educational content and courses to help Amplitude users better analyze their customer data to build better products.
More Perspectives
January 8, 2025
Experience Design Leader
January 6, 2025
Group Product Marketing Manager, Amplitude
December 9, 2024
Startup Programs Manager
November 12, 2024
CEO & Co-founder
Platform
Resources
Support
Partners
Company
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.