Agent Analytics Overview

Early Access

This feature is in Early Access. During this time, aspects of the functionality may still be developed, and this documentation may not always be up to date. If you have any questions, contact Amplitude Support.

Measure and improve the AI agents you ship to your users. Agent Analytics decomposes agent traces into events, runs configurable evaluators on every interaction, and links AI session data to product event data on shared user identity. Product managers connect agent behavior to business outcomes. Engineers catch regressions before they reach production.

Because AI session data shares Amplitude user identity with product event data, you can connect agent quality to retention, conversion, and revenue. Agent quality covers topics, hallucinations, task completion, and tool errors. Pure observability tools can't make that connection, and pure product analytics tools don't measure agent quality with enough rigor.

Use cases

A growth PM filters sessions where the support agent failed to answer a billing question, then builds a cohort. The PM runs a retention chart to compare that cohort against users with successful sessions.
An ML engineer ships a new system prompt, watches the task-completion evaluator drop 8 points overnight, and rolls back before the next standup.
A finance lead breaks down LLM spend by topic and finds that returns-related sessions cost 3 times more than recommendations sessions while scoring lower on satisfaction.
A platform team replaces weekly manual interaction sampling with automated evaluators that score every session for hallucinations, tool errors, and task completion.
A vendor management team evaluates a third-party AI provider against the same evaluators it uses for in-house agents to compare quality on equal terms.
A research team plugs in its own offline evaluators through customer_enriched mode to score sessions on custom dimensions that the default evaluators don't cover.

How Agent Analytics works

Agent Analytics models each agent interaction as a hierarchy of events. The same user identity flows through this hierarchy and into your existing Amplitude product data, so you can analyze agent sessions alongside everything else you track.

Session: One unit of work that a user hands the agent to complete from start to finish, such as a chatbot conversation, a resolved support ticket, or a completed coding task. Amplitude identifies a session with the [Agent] Session ID property, which you set from an ID you already track.
Turn: A single back-and-forth exchange within a session: a user message, the agent's tool calls, and the AI response.
Span: A sub-turn unit: a tool call, vector search, rerank, guardrail, or custom step.
Agent: A named orchestration unit. Agents can have child agents.

Every user message, AI response, and tool call lands as an independent Amplitude event. This per-turn event model is the structural difference from trace-centric observability tools. You can use agent events directly in funnels, cohorts, retention charts, and Session Replay links without decomposing a trace first.

Agent session vs. standard-analytics session

An agent session isn't the same as Amplitude's standard-analytics session.

The agent session, [Agent] Session ID, is a single unit of work that the user hands to the agent. Amplitude's standard-analytics session, $session_id, is the user's app or web visit that powers Session Replay and product reports. For example, when you open Amplitude, you start your standard-analytics session. When you start Global Agent from within Amplitude, you start your agent session. The two are independent: a standard-analytics session can contain several agent sessions, and a single agent session can span several standard-analytics sessions. You set the agent session from your own thread, ticket, call, or run ID.

On top of this structure, Amplitude measures quality in three ways:

Signals: Always-on enrichments that Amplitude runs on every session, including task completion, response quality, user friction, and safety. You don't configure them.
Evaluators: Scorers you define and calibrate in the Amplitude UI for product-specific criteria, using rules or LLM-as-judge. You can also bring your own evaluators through customer_enriched mode.
Scores: Explicit user feedback (thumbs up or down, optional comment) attached to a session or message. Scores capture reactions to the agents you build, similar to how AI Feedback captures reactions to Amplitude's built-in AI. You send scores from your app; Amplitude doesn't generate them. To learn how, refer to Send user feedback (scores).

Instrument with the Node or Python SDK. Setup details live on the setup page.

Manage access with RBAC

Admins control who can use Agent Analytics through role-based access control (RBAC). Three permissions apply to Agent Analytics:

View Agent Analytics Objects: View sessions, evaluators, and related objects. A role needs this permission to use Agent Analytics at all.
Manage Inactive Evals and Runs: Create, update, and delete draft evaluators, and launch dry runs.
Activate Evals: Mark evaluators as active and archive live evaluators.

Get started

Set up Agent Analytics covers choosing an instrumentation path, setting a privacy mode, and verifying your data.
Analyze agent results covers evaluators, scores, and connecting agent data to product cohorts and charts.

Was this helpful?