Platform

AI

AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Warehouse-native Amplitude
Unlock insights from your data warehouse
Data Governance
Complete data you can trust
Security & Privacy
Keep your data secure and compliant
Integrations
Connect Amplitude to hundreds of partners
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

AI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Warehouse-native AmplitudeData GovernanceSecurity & PrivacyIntegrations
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksTemplatesTracking GuidesMaturity Model
LoginSign Up

What We Learned Building AI Products in 2025

Amplitude customers consumed billions of tokens last year. Here's what actually worked and what didn't.
Company

Jan 22, 2026

9 min read

Nirmal Utwani

Nirmal Utwani

Amplitude Director of Engineering, AI Analytics

AI Product Learnings 2025 feature

Last year, we shipped 3 new AI Products and over 20 new AI analytics features to over 4500 enterprise customers, who consumed 13B tokens, up 100X from the previous year. But here's what those numbers don't tell you: we learned more from what broke than what worked.

There's a lot of buzz around agents heading into 2026. Anthropic’s launch of Claude Cowork last week is igniting the debate over specialized vertical agents vs. general-purpose, long-horizon agents. Meanwhile, VCs are predicting AGI in 2026. Yet, Andrej Karpathy thinks we're looking at a 10-year timeline for functional, reliable AI agents.

After 10 years of building both ML and GenAI products that serve millions of users, here's what we’ve learned building AI products for enterprise customers in 2025, specifically what's working for vertical agents in our industry and what's broken.

The gap between general agent demos and production scale is real

Agents work best on tasks where outputs are easy to verify automatically. That's why coding agents and math solvers have taken off. But analytics? Analytics is a hard AI use case. Unlike code, which is structured and defined, an organization’s data is messy and ambiguous. On top of that, it’s hard to verify an insight when a question is open-ended or intentionally exploratory.

We learned this the hard way in analytics. Most organizations aren't ready for general-purpose agents in production. They have data silos and lack specialized context, tools, and observability infrastructure.

In this landscape, getting to full autonomy even for simple workflows requires higher quality, reliability, and trust than most people realize. Most folks overestimate how quickly agents will automate many forms of specialized work, such as analytics.

Users need different levels of autonomy from agents

For exploratory analysis, users don’t want to wait 30 minutes to see results. They want fast responses with a logical thought process. Agents that run faster iterations and interact with the user more frequently outperform long-running autonomous background tasks in user satisfaction. People want to stay in the loop, understand what's happening and why.

When agents take smaller steps, there’s less room for error and faster feedback from the user to make sure the project is on the right track.

Just to be clear, we think there’s certainly still a lot of room for long-running background agents with autonomy, especially as models continue to improve. They are a good fit for clearly defined work where the answers are known and the output can be verified at key checkpoints. Think data migration, taxonomy clean-ups, etc. These are set-and-forget tasks, they don’t require the same level of collaboration between user and machine.

Distribution and flexibility matter more than features

New AI features struggle without existing user pathways. "Build it and they will come" doesn’t work.

Engagement spiked 10x as soon as we made Agents available in our product via Slack. MCP (Model Context Protocol) continues to be a big unlock for customer teams that are traditionally not power users of our product, like software engineering and product engineering.

Customers expect to flexibly pivot across tools and context sources mid-thread. Power users need control over tools, context sources, and prompts. Success requires integration into the workflows and tools people already use.

Teams should invert the build:eval ratio

Most teams spend 60-70% of their time building features and only 30-40% building evals. For some of our bets, we tried flipping that ratio. It’s clear to us that those bets were the ones that actually added customer value.

Candidly, before our team could embrace eval-driven development, we also had to invest in strong LLM observability and analytics. We looked for an existing solution that could reliably tie agent performance to better product UX, but didn’t find anything that worked for us. We ended up building something to give us the kind of visibility and tight feedback loop we needed.

Here's the playbook that worked for us:

  • Phase 1: Evals become the new PRDs. We started by using evaluations to define the core use cases and requirements that guide how AI agents are built, tuned, and improved. We then empowered the broadest set of SMEs and product teams to create and maintain evals. Next, we expanded and sharpened the eval suite. This is one of the highest-leverage ways PMs can spend their time.
  • Phase 2: Move fast, ship often—with high visibility. We started with manual reviews, then automated as we earned confidence, backed by strong observability (hello, Amplitude!). Don’t dismiss qualitative judgment (“vibe checks”). We made sure to capture it, then translate it into repeatable evals whenever possible.
  • Phase 3: Keep growing your eval bank as you learn. Now, every time we uncover a new failure mode, we add an eval for it. We use evals to prevent regressions. We compare approaches and consistently choose the best model for your task.

The playbook works when you repeatedly observe and iterate. The number of iterations matters. Re-analyze every few weeks. Your understanding of "good" evolves, and that's normal.

Foundation model improvements create huge product leverage

One advantage of our new agentic architecture is that we get leverage from foundation model improvements immediately. With eval tooling, we can test new models in a few hours.

In initial testing for Agents, eval's success rate was in the mid 40s. When we moved from Haiku to Claude Sonnet, eval accuracy (before any further optimizations) jumped from 47% to 65%. We tested Gemini 3.0 Pro at 62% and evaluated it within a few hours after its launch. In addition, we’ve found that other models are better suited for certain sub-agents—and we route those tasks accordingly to get the best performance.

Speed matters. When you can validate model improvements in hours instead of weeks, you can ride the wave of foundation model progress instead of being left behind.

Roadmaps need to update in real time to reflect development velocity

For AI product development, we've shifted from quarterly planning to a flexible model.

A tiger team of engineers, PMs, and designers meets with two to three customers per day, identifies broken experiences, and fixes them. Often, those fixes happen the same day customers flag them. Traditional planning cycles are too slow for how fast this space moves.

Non-engineers are shipping code across the organization

With tools like Cursor and Claude Code integrated into our codebase, along with devX improvements, almost everyone on our design and product team has started writing code and improving our product surface areas directly. We’ve seen a 300% growth in the number of PRs created by non-engineers spanning bug fixes, copy improvements, layout updates, new feature requests, and a lot more.

We’re not aiming to replace engineers. It's about unlocking velocity across the entire organization. When designers can fix button alignment themselves and PMs can adjust copy without a sprint ticket, everyone moves faster.

Demand for engineering is increasing

In 2025, the big question was whether the rise in these agents would lead to job losses, with engineering first on the chopping block. We've done the opposite. For context, we doubled our new engineering headcount in 2025, and we plan to grow open headcount by 55% in 2026.

None of this is to imply that AI is not automating our work. But as we continue to ship faster, we’re also seeing that teams building AI products need more humans in the loop, not fewer, to handle the complexity of evals, integrations, and the constant iteration required to ship real value.

The bottom line

The hype cycle around AI agents is real, and while we’re seeing agents excel at verifiable tasks like coding, in many other fields we’re still seeing a gap between demos and enterprise readiness.

2025 taught us that customers want software that they can use collaboratively and reliably. The software teams that are winning today aren't the ones chasing full autonomy—Although that might be the right strategy only for the frontier AI Research labs. The best AI application and infra teams are the ones building robust evals, meeting users where they already work, and staying flexible enough to adapt as foundation models improve.

We're still early. But we're learning fast. And we’re excited to ship in 2026!

About the author
Nirmal Utwani

Nirmal Utwani

Amplitude Director of Engineering, AI Analytics

More from Nirmal

Nirmal is a founding engineer at Amplitude, part of the team from Day 1, building the company's analytics platform from the ground up. As Director of Engineering, he now leads teams responsible for Amplitude's AI products and core analytics capabilities, serving millions of users across thousands of enterprise customers. His expertise spans distributed systems, query engines, and applying LLMs to complex analytical workflows.

More from Nirmal
Topics

AI

Agents

LLM

Machine Learning

Product Strategy

Recommended Reading

article card image
Read 
Customers
How Praktika Sped Up Experimentation & Achieved 20x Revenue Growth

Jan 15, 2026

7 min read

article card image
Read 
Company
Amplitude Acquires InfiniGrow to Expand Marketing and Revenue Capabilities

Jan 14, 2026

4 min read

article card image
Read 
Insights
The Modern Guide for Implementing AI Agents

Jan 9, 2026

8 min read

article card image
Read 
Insights
From Tickets to Product Truth: How Support Data Powers Better Products

Jan 8, 2026

11 min read

Platform
  • Product Analytics
  • Feature Experimentation
  • Feature Management
  • Web Analytics
  • Web Experimentation
  • Session Replay
  • Activation
  • Guides and Surveys
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Heap
  • Optimizely
  • Fullstory
  • Pendo
Resources
  • Resource Library
  • Blog
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2026 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Agents

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

LLM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Revenue

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude