Agent Analytics

What’s the real impact of your AI agent?

Knowing your agent gave a good response is the easy part. Agent Analytics ties traces and evals to conversion, retention, and revenue, so you can connect agent quality to real business impact.

Sign up for early access
Amplitude Agent Analytics session view
Two Mira assistant chats comparing on-budget and over-budget hotel recommendations

When traditional analytics falls short

Most analytics tools were built for clicks and page views, not for reasoning and tool calls. Agents can hallucinate, ignore instructions, and confidently go off-track. All users who spend time with your agent look equally engaged. Who actually found it useful and came back?

Go beyond observability

Learn the product and revenue impact of your agents.

AI Quality

What the agent did

Product Outcomes

What the user did next

01 Observe

Inspect traces, prompts, tool calls, responses, latency, and cost. The raw record of what the agent actually did.

02 Evaluate

Score quality, intent, resolution, and failure modes. See where the agent helps, where it gets confused, and where it adds risk.

03 Decide

Tie those quality signals to conversion, retention, and revenue. Did the agent actually move the user forward?

04 Deploy

Tune prompts, run experiments, trigger guides, and personalize the next step from what you learned.

A raw trace of spans next to the same turn decomposed into Amplitude events

Trace turns automatically become Amplitude events

Each user message, tool call, and agent response is an Amplitude event with the same user_id as the rest of your product data. Unlike observability tools that stop at the trace, Agent Analytics decomposes these conversations into events, making them directly queryable in the same funnels, cohorts, and retention analyses you already use.

The questions you can finally answer

01

Did our model upgrade lift sign-up conversion this week, or hurt it?

02

What is the conversion delta when the agent answers correctly versus hallucinates?

03

Which agent topics correlate with expansion intent, and which ones with churn risk?

The Agent Analytics maturity model

Most observability tools stop at the lower levels of maturity. Agent Analytics takes you to the top by connecting AI quality to the user journey and revenue.

L4Revenue Attribution

What is the AI worth in dollars?

L3Behavioral Analytics

How does AI usage affect the user journey?

L2Semantic Intelligence

What is the agent actually doing?

L1Evaluations & Assertions

Did the agent do it correctly?

L0Tracing & Telemetry

Can I see what happened?

Inside Agent Analytics

Production runs surprise teams with questions they never prepared the model for. Read the user prompt, the agent’s response, the tools it called, and the context it pulled, then jump straight to Session Replay to see what went wrong.

tool callsprompt versionscontext retrievaljump to replay

Instrument any LLM provider

Native wrappers for the providers you actually use. An OpenTelemetry bridge for the rest. Manual capture when you want full control.

your terminal
$
$

Python and Node, drop-in SDK, live in minutes.

Content-optional analytics

Purpose-built to let you control what leaves your environment.

TierWhat you sendWhat you get
Metadata OnlyTokens, cost, latency, behavioral signalsCost analytics, retention curves, funnel drop-off. No conversation content leaves your environment.
Customer EnrichedYour classification labelsFull topic and quality analytics. You run your own classifiers and send us the structured labels.
FullConversation contentAutomatic topic classification, quality scoring, and behavioral pattern detection.

Send full conversations, your own labels, or metadata only. Switch modes per agent or per event source.

Agent Analytics is in beta

Stop shipping on vibes

  • Find the failure modes that actually cost you users.
  • Measure conversion and retention by agent quality.
  • Connect any trace to session replay and experiments.
Sign up for early access

Sign up for early access

Frequently asked questions

The analytics layer between LLM observability and product analytics. Every user message, tool call, agent response, and session end becomes an Amplitude event tagged with topic, quality score, and behavioral pattern, so you can build cohorts, funnels, and retention curves on AI session quality.

Tracing tools answer “What did the agent do?” Agent Analytics answers “Did it work for the user?” You can keep your tracing tool: AmplitudeGenAIExporter adds Amplitude as a second OpenTelemetry destination in one span processor registration.

No. The SDK has three privacy modes: metadata_only (tokens, latency, cost and behavioral signals only), customer_enriched (your own labels, no raw text) and full (managed enrichment). Most teams start in metadata-only and upgrade as trust builds.

Python on PyPI (pip install amplitude-ai) and Node.js / TypeScript on npm (npm install @amplitude/ai). Native wrappers for OpenAI, Anthropic, Gemini, Bedrock, Mistral, and Azure OpenAI. Framework integrations for LangChain, LlamaIndex, OpenAI Agents SDK, CrewAI, and the Claude Agent SDK. Anything emitting OpenTelemetry GenAI spans (OpenLIT, Traceloop, and OpenAI instrumentation) flows in via the bridge.

Every agent event carries a Session Replay ID. From any session in the explorer, View Replay opens at the moment the conversation started, so you watch the agent fail inside the actual product the user was using.

Each event carries the experiment variant as a property, so prompt A/B tests attribute correctly across multi-turn conversations. Quality scores and behavioral patterns flow into cohorts that Guides and Activation target in real time. Same workspace, same identity graph.

A new era of analytics

From a live agent overview to evals and datasets, every view ties what the agent did to what the user did next.