Platform

AI

Amplitude AI
Analytics that never stops working
AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool
AI Assistant
Support powered by product data

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Data Governance
Complete data you can trust
Integrations
Connect Amplitude to hundreds of partners
Security & Privacy
Keep your data secure and compliant
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Customer Success
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Prompt Library
Prompts for Agents to get started
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

Amplitude AIAI AgentsAI VisibilityAI FeedbackAmplitude MCPAI Assistant

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Data GovernanceIntegrationsSecurity & Privacy
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingCustomer SuccessProduct Updates

Tools

BenchmarksPrompt LibraryTemplatesTracking GuidesMaturity Model
LoginSign Up

Building the Validation Stack for AI Product Development

The hardest part of shipping isn't building anymore. Amplitude and Statsig are building the validation layer for AI product development.
Company

May 14, 2026

7 min read

Eric Metelka

Eric Metelka

Director of Product Management, Experimentation, Amplitude

Validation Stack

A lot has happened in a year in the world of experimentation. A year ago my company, Eppo, which offered warehouse-native experimentation, was bought by Datadog. A year later and my company, Amplitude, is welcoming Statsig, its customers, and its brand to its platform.

The team at Statsig built a strong product. They recognized early that engineers had a need for better tools for rollouts and to understand the value of what they were shipping. They developed a builder-first approach to feature flags, experiments, metrics, and rollout controls, that clearly resonated in the market.

At Amplitude, we believe, just like Statsig does, that experimentation is core infrastructure and a foundational part of how products get built. This is even more important in an AI world. Partnering with Statsig is an opportunity to accelerate a shared vision for the future of product development.

How building products has changed

The bottleneck in product development has moved. It used to be writing code. But now, with a majority of developers using AI coding tools, code generation is only getting faster. PMs write code while designers build and ship full UX flows. The code barrier to getting something built has fully collapsed.

But the gap between shipping a new feature and knowing that it’s good for users has actually gotten wider. Teams are shipping faster than ever, and while the volume of changes going out the door has exploded, the infrastructure to validate those changes hasn't kept pace. Existing bottlenecks in the experimentation process compound when shipping velocity increases.

With non-deterministic products like LLMs, it has become even harder to determine if you’re shipping the right thing. Whether you’re working on a chatbot, a recommendation engine, or something else, non-deterministic outputs give you a different response every time. Unit tests can’t give you the confidence you need. Experimentation can.

Additionally, the people building these products aren't necessarily the same people who ran experiments five years ago. The number of people capable of writing code or shipping new features has exploded, but the number who deeply understand how to validate those features has not. Modern experimentation tooling needs to support a much broader range of AI builders.

Building the validation stack for AI product development

Internally, we’re thinking about what the “2.0” of experimentation needs to become.

Version 1.0 is a known loop: ship with feature flags, measure impact with experiments, understand usage with analytics. That loop still works. But teams building AI products need another layer of validation and rigor. You need offline evaluation, live experimentation, and continuous monitoring working together.

The starting point for Experimentation 2.0 is offline evals. Instead of manually checking a few outputs and hoping for the best, you run prompts and models through thousands of labeled test cases before anything reaches production. The goal is to catch regressions early and avoid surprises in production.

Say you’re running an AI support ticket classifier. You have a prompt that triages tickets to billing, technical support, or sales. You update the prompt to handle edge cases better. Is the new version actually better? Offline evals let you run both versions against a labeled dataset of a thousand tickets, score them against graders (including LLM-as-a-Judge for cases where string matching doesn’t work), and see exactly where the new version wins and where it regresses. You iterate on this loop rapidly before any user sees the change.

From there, you move to progressive rollout with gradual deployment and instant rollbacks, tied to service metrics, business KPIs, and LLM-specific observability signals. If latency spikes or error rates climb, the system responds before the issue spreads.

Then onto online experimentation. A/B tests on live traffic with statistical confidence. Shadow-mode evals that grade model output against production scenarios without exposing users to risk. Every rollout should measure impact, not just reduce risk.

Running through this entire 2.0 loop is LLM observability, which gives you real-time logging, monitoring, and anomaly alerting in a single view alongside business metrics and user engagement. When something goes wrong with your AI product, you shouldn’t need four dashboards to figure out where.

Amplitude + Statsig will get there faster

Statsig and Amplitude were already building toward the same future; one where flags, experiments, and analytics aren’t separate products you have to stitch together, but layers in a single system that covers the full product development lifecycle.

This partnership accelerates that vision. Amplitude has been building out Agent Analytics to connect observability and evals with product analytics, while Statsig’s roadmap has been focused on building capabilities like AI Configs for controlling prompts and model parameters without redeploying, and an MCP server integration that embeds experimentation directly into AI coding workflows.

We’re continuing to invest in both platforms with a focus on maintaining the existing Statsig platform across cloud and warehouse deployments and supporting current customers through the transition. We’re also building a shared roadmap that moves both platforms forward together.

Experimentation at the speed of shipping

A year ago, no one knew how the evaluation loop needed to change for probabilistic products. Now we do. AI coding assistants generate more changes than any team can manually validate. LLM-powered products introduce non-deterministic behavior that demands continuous evaluation and validation. The cost of shipping a bad change keeps climbing as products get more complex.

The teams that will outperform with AI aren’t necessarily the ones shipping the most features, but the ones learning what worked and feeding that answer back into the next decision. This creates a feedback loop that accelerates product velocity.

Amplitude spent years making experimentation faster and more accessible. Statsig spent years making it more powerful and more developer-native. Together, we’re building the validation layer that closes the gap between shipping and understanding value.

Try Statsig

Explore the future of warehouse-native experimentation. Create a free Statsig account in minutes or get a live demo.

(PS Yes, this is a little odd for us too.)


About the author
Eric Metelka

Eric Metelka

Director of Product Management, Experimentation, Amplitude

More from Eric

Eric is Director of Product Management, Experimentation at Amplitude. Previously he was Head of Product at Eppo and created the experimentation practice at Cameo. He is focused on helping customers set up and scale their experimentation practices to increase their rate of learning and prove impact.

More from Eric
Topics

AI

Amplitude Feature Experimentation

Amplitude Web Experimentation

Experimentation

Recommended Reading

article card image
Read 
Product
Most Teams Ship Agent Personalities by Accident. We Didn’t.

May 13, 2026

6 min read

article card image
Read 
Insights
What I Learned Pointing a Ralph Loop at My Product for a Week

May 13, 2026

12 min read

article card image
Read 
Insights
Claude Cowork for PMs: 5 Playbooks to Get Started

May 12, 2026

7 min read

article card image
Read 
Customers
How ACKO Drove 13% More Conversions & 50% Drop in Calls with GenAI

May 12, 2026

9 min read

Platform
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
  • AI Assistant
  • Product Analytics
  • Web Analytics
  • Feature Experimentation
  • Feature Management
  • Web Experimentation
  • Session Replay
  • Guides and Surveys
  • Activation
Compare us
  • Adobe
  • Google Analytics
  • Contentsquare
  • Fullstory
  • Heap
  • LaunchDarkly
  • Mixpanel
  • Optimizely
  • Pendo
  • PostHog
Resources
  • Resource Library
  • Blog
  • Agent Prompt Library
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Status
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Partner Program
  • Partner Directory
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
View markdown
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (LATAM)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2026 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Agents

Amplify

Amplitude AI

Amplitude Academy

Amplitude Activation

Amplitude Agent Analytics

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Full Platform

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

Customer Support

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Engineering

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

LLM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Revenue

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude