Platform

AI

AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Warehouse-native Amplitude
Unlock insights from your data warehouse
Data Governance
Complete data you can trust
Security & Privacy
Keep your data secure and compliant
Integrations
Connect Amplitude to hundreds of partners
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

AI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Warehouse-native AmplitudeData GovernanceSecurity & PrivacyIntegrations
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksTemplatesTracking GuidesMaturity Model
LoginSign Up

8 Data Science Skills That Every Employee Needs

When data scientists look to others for improving their skills, they can get a better understanding of the company that exists around the data.
Insights

Nov 16, 2016

14 min read

Archana Madhavan

Archana Madhavan

Senior Learning Experience Designer, Amplitude

8 Data Science Skills That Every Employee Needs

Analytics is about getting your team the data insights it needs to build better products and make the right decisions for your company.

But if your team can’t understand that data, then this is all for naught. Software like Amplitude can make your data easy to understand, but each member of your team still needs basic data skills to get the most value out of what they’re looking at.

These skills can help your team, regardless of whether that’s in product or marketing or sales, interpret the data as it comes in. It also gives them the skills to work with your data scientists to propose new ideas for your product, as well as the confidence to work alongside them to improve the business.

To do all of this, every employee should be able to…

1. Understand what correlation means

Correlations get a rough ride, but they are the backbone of data science. We always want to know how different variables change in relation to each other.

For instance, your two variables might be the number of people completing your onboarding flow and the number of people retained after a month. If your onboarding flow is valuable and helps new users get to their ‘a-ha’ moment quickly, then you may guess that two numbers are positively correlated–when the first variable increases (people completing the onboarding flow), the second variable increases as well (retention after one month).

Correlation ranges between -1 and +1. A negative coefficient means two variables are affected in opposite directions (as one variable goes up, the other goes down). A positive coefficient means both variables change in a positive direction (as one goes up, the other goes up too). A correlation of exactly 0 indicates no relationship between the two variables.

We use correlations in Compass to see how well users performing certain events predicts whether or not those users were retained. We have generalized event correlations into four categories:

  • Highly Predictive (correlation >= 0.4)
  • Moderately Predictive (0.4 > correlation >= 0.3)
  • Slightly Predictive (0.3 > correlation >= 0.2)
  • Not Predictive (correlation < 0.2)

A screenshot from Compass showing the correlation of users who added a friend in 7 days to being second week retained.

1-compass

It is 100% true that correlation doesn’t equal causation. Just because two variables move together, doesn’t mean that one actually changes the other. For example, your revenue and number of people completing your onboarding flow are probably positively correlated, but that doesn’t mean completing onboarding directly causes morespend; both metrics are probably bumped by an overall increase in current customers.

The correlation doesn’t equal causation argument is exactly why you need to run tests when you are trying to make improvements to your product.

2. Find the best sample size for your tests

Your hypothesis is that the font of your signup page footer is what is holding back your conversions. Your designer has chosen Roboto, whereas the latest growth hacks tell you Comic Sans is a conversion winner. You start your A/B test, and then nothing happens. Literally. It’s not that you don’t get good results, it’s that you’ll get no results.

Your sample size will be too small.

Even if you get millions of views a month, only a small percentage of those will go to the signup page, of which only a small percentage will go to the footer, of which only a small percentage will click on the text. Divide that small fraction by half for your control and testing pages, and your eventual sample size is too small to show any significant change.

That might be an extreme example, but for A/B tests to work you need a large number of people in both your A and B conditions. These tiny change experiments can work for Facebook and Google because they have obscene traffic. But you likely don’t. Any employee who wants to set up an A/B test needs to understand the limitations that sample size will put on what they can test.

Optimizely has a sample size calculator which will help you understand the kind of numbers you need for statistical significance.

2-optimizely-screenshot

This allows you to set what your current conversion rate is along with what effect you want to see. Decide your significance level and it will spit out the sample size you need for each variation. Here you can see that if you want to see your conversion rate change from 3% to 3.3% and know that it is a real change, you need 57K people in each of your variations.

If you don’t have that type of traffic, you need a rethink your test.

3. Know why PPV matters

PPV, or positive predictive value, is a measure of the precision of your tests. It allows you and your team to know whether the behaviors you are measuring are predictive of the metrics you are interested in, such as retention.

You calculate PPV by taking the number of true positive samples in your experiment and dividing by the combined number of true and false positive samples.

3-positive-predictive-value

Let’s run through what this means with Facebook’s “7 friends in 10 days.” If Facebook were using Compass at the time of launch, they could have looked at this exact question: Did a user adding 7 friends or more in the first 10 days of use increase their chances of retention 2 months later? Compass would then compute this matrix for them:

4-matrix

This also includes the negative predictive value, NPV. What this tells you is that this question is a great way to separate out your cohorts. Adding 7+ friends in the first ten days is highly predictive of whether that person will be around in two month’s time. Conversely, not adding 7 friends is highly predictive of that person not being around in two month’s time.

With a simple question and a simple metric, you’ve been able to parse out your most important cohorts and find exactly what is driving them to stay.

This is only a small part of a larger confusion matrix. Through just setting out a simple binary question like this and tracking true and false positives and negatives, you can compute an array of statistics that help you understand your data better:

(Source: Wikipedia)

5-confusion-matrix

4. Think Like Bayes

Bayesian statistics differs from more orthodox “frequentist” statistics by treating the world as probabilistic. This means that instead of sharp decision boundaries (e.g. _this hypothesis is true/false) you get probabilities on whether your hypothesis is true or false (e.g. t_here is a 90% chance it’s true).

Another fundamental difference is that Bayesian thinking allows you to use your knowledge of the world, called your prior, to build your initial model:

(Source: xkcd)

6-frequentists-vs-bayesians

These probabilities can then be updated as more data comes in. This means that as you are running experiments, you can update your thinking depending on the evidence. Bayes works like this:

  1. First, you take your hypothesis: Changing the button on the signup page will increase signups.
  2. Then, you assign a probability to it being true, which is your prior: Changing buttons on other pages has increased conversion rates, so I think changing this one will have an X% chance of success.
  3. Collect data and incorporate the implications of that data into your previous statement of the proposition’s probability. This is your posterior.
  4. Then, your posterior becomes your prior for the next iteration. Collect more data, repeat, and continue.

(Source: wikimedia commons)

7-bayes-theorem

The best thing for non-data scientists about Bayesian thinking is that it’s intuitive—you already do this. You have an idea of how something works in your head, and as new evidence comes in, you update your internal model.

5. Know the limitations of machine learning

Machine learning is a catch-all term for a number of different algorithmic and statistical methods. This is the cool tool du jour, with almost every new startup having some AI component. It even makes pretty pictures:

(Source: The Guardian)

8-machine-learning-drawing

This image was created by the “dreams” of DeepMind, Google’s machine learning tool. You don’t have to understand the finer points of DeepMind, but it is a good idea to understand what machine learning is and what it is not, what it is capable of and what it is not. This can help you in two ways:

  1. If you have an idea about some data, you’ll be in a better position to know whether machine learning can help you understand that data better.
  2. If another company comes knocking on your door saying that their algorithms are the answer to all your problems, you’ll know whether they are immediately BS or not.

Machine learning works on an iterative process. Through iterative training, these algorithms can learn to describe features in your dataset. They train a model, which could be as simple as linear regression or as complicated as DeepMind’s convolutional neural networks, on known data. Then that model can be used to classify unknown data.

What’s important to know is that ML is not the answer to all the world’s ills. While it can find relationships that other techniques will miss, it won’t give you answers that aren’t already in your data. It is still dependent on really good quality data. Which leads us neatly to…

6. Clean Up Your Data

If there is just one item on this list that your data scientist would want you to learn, it would be this. You will instantly become their best friend if you present them with a clean dataset for their analysis. When you do, they can get to work faster, and you get answers faster.

Learning to clean datasets makes everyone’s life better.

7. Write SQL

SQL stands for Structured Query Language, and is the language of almost all databases. By speaking SQL, you are speaking the language of your data and your database. You will have all of your data at your fingertips.

Sometimes you might want to just test out a theory before you bring in your data scientists. If you already understand SQL, then you can quickly run a few queries to see if your theories hold weight.

Now, Amplitude prides itself in providing accessible data insights for the whole company,regardless of whether a person knows SQL or not. But for the budding data scientist in you, we do also give you direct SQL access to your event data, giving you the flexibility to answer even the most complex questions about user behavior.

8. Tell a good story

This is another data science skill that stands out as non-obvious. But if neither your data scientists nor you have this skill, then all the other skills are useless.

What’s often forgotten about with data scientists is that they are not all robots (yet). Data science isn’t all about the math. In fact, when it is, it’s next to useless for the business as a whole. Data scientists need to build a convincing story with your data. This is why they are so indispensable. They need story-telling abilities to convince the audience of what the data shows.

By developing their “light quant” skills and working as an analytical translator, your data scientists can explain your data to the company and the world in a more compelling fashion.

Every other employee needs these skills, both for their own work (telling you a compelling story of why you should hire a new CFO) and for when they want to discuss data. By building that story around data, not letting it sound stale, you are more likely to get people on board with any experiments you want to run.

3 Skills That Every Data Scientist Needs

Just as other employees need data science skills, so do data scientists need to take on skills from other areas of the business:

  • Without business acumen, they won’t know which questions and data are most vital to the success of the company.
  • Without creativity, they won’t think up the best questions and what possibilities there are with the data.
  • Without reasoning, they won’t arrive at the true answer to all their questions and data.

If your team understands some of the basic scientific concepts that underpin your data scientist’s role, then they can better prepare their data and have a more in-depth conversation with them. When data scientists look to others for improving their skills, they can get a better understanding of the company that exists around the data.

Then both analysts and others in your company can be stronger together.

About the author
Archana Madhavan

Archana Madhavan

Senior Learning Experience Designer, Amplitude

More from Archana

Archana is a Senior Learning Experience Designer on the Customer Education team at Amplitude. She develops educational content and courses to help Amplitude users better analyze their customer data to build better products.

More from Archana
Topics
Platform
  • Product Analytics
  • Feature Experimentation
  • Feature Management
  • Web Analytics
  • Web Experimentation
  • Session Replay
  • Activation
  • Guides and Surveys
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Heap
  • Optimizely
  • Fullstory
  • Pendo
Resources
  • Resource Library
  • Blog
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude

Recommended Reading

article card image
Read 
Customers
The Future is Data-Driven: Introducing the Winners of the Ampy Awards 2025

Dec 2, 2025

6 min read

article card image
Read 
Insights
Marketing Analytics in 2026: Predictions from the People Who Measure Everything

Nov 25, 2025

9 min read

article card image
Read 
Customers
Amplitude Pathfinder: How Dan Grainger Bet on Amplitude & Won

Nov 25, 2025

16 min read

article card image
Read 
Product
Getting Started: Driving Product Engagement by Obsessing Over Activation

Nov 24, 2025

4 min read

Explore Related Content

Integration
Using Behavioral Analytics for Growth with the Amplitude App on HubSpot

Jun 17, 2024

10 min read

Personalization
Identity Resolution: The Secret to a 360-Degree Customer View

Feb 16, 2024

10 min read

Product
Inside Warehouse-native Amplitude: A Technical Deep Dive

Jun 27, 2023

15 min read

Guide
5 Proven Strategies to Boost Customer Engagement

Jul 12, 2023

Video
Designing High-Impact Experiments

May 13, 2024

Startup
9 Direct-to-consumer Marketing Tactics to Accelerate Ecommerce Growth

Feb 20, 2024

10 min read

Growth
Leveraging Analytics to Achieve Product-Market Fit

Jul 20, 2023

10 min read

Product
iFood Serves Up 54% More Checkouts with Error Message Makeover

Oct 7, 2024

9 min read