Agent Analytics taxonomy
This feature is in Early Access. During this time, aspects of the functionality may still be developed, and this documentation may not always be up to date. If you have any questions, contact Amplitude Support.
[Agent] event, the properties on the enrichment events, and the default signals. To read these results in the UI, go to Analyze agent results. To emit the events from your code, go to the Agent Analytics SDK.The taxonomy is configurable and still evolving during Open Beta. Treat the lists below as the default shape, and confirm the live set against your own event stream.
Data hierarchy
Agent Analytics models each agent interaction as a hierarchy:
- Session: One job a user hands the agent, from start to finish. Amplitude identifies a session with the
[Agent] Session IDproperty. The agent session differs from Amplitude's standard-analytics session ($session_id), which is the user's app or web visit. - Turn: A single back-and-forth exchange within a session: a user message, the agent's tool calls, and the AI response.
- Span: A sub-turn step, such as a tool call, vector search, rerank, or guardrail.
Every user message, AI response, and tool call lands as an independent Amplitude event, so you can use agent data in funnels, cohorts, and retention charts without decomposing a trace first.
Event inventory
Agent Analytics produces these events. Your SDK instrumentation produces the first seven. The server enrichment pipeline produces the rest after a session closes.
| Event | Producer | SDK method |
|---|---|---|
[Agent] User Message | SDK | track_user_message() |
[Agent] AI Response | SDK | track_ai_message() |
[Agent] Tool Call | SDK | track_tool_call() |
[Agent] Embedding | SDK | track_embedding() |
[Agent] Span | SDK | track_span() |
[Agent] Session End | SDK | track_session_end() |
[Agent] Session Enrichment | SDK | track_session_enrichment() |
[Agent] Session Record | Server enrichment pipeline | Internal |
[Agent] Evaluator Result | Server enrichment pipeline | Internal |
[Agent] Score | SDK, or server for feedback | score() |
SDK events
Your instrumentation produces these events as the agent runs. The [Agent] AI Response event carries the per-response model, provider, token, latency, and cost properties.
[Agent] User Message: a message the user sends to the agent.[Agent] AI Response: the agent's response, with model, provider, tokens, latency, and cost.[Agent] Tool Call: a function or tool the agent invokes.[Agent] Embedding: an embedding or vector-search step.[Agent] Span: any other pipeline step, such as a rerank or guardrail.[Agent] Session End: marks the end of a session.[Agent] Session Enrichment: your own session labels, sent incustomer_enrichedprivacy mode.
Server enrichment events
After a session closes, the enrichment pipeline assesses it and writes two events back to your event stream.
Session Record
[Agent] Session Record lands once per session. It carries the session rollups, the always-on signal results, and quality flags.
| Property | Type | Description |
|---|---|---|
[Agent] Turn Count | number | Number of turns in the session. |
[Agent] Session Total Tokens | number | Total LLM tokens across all turns. |
[Agent] Session Avg Latency Ms | number | Average AI response latency in milliseconds across the session. |
[Agent] Models Used | string[] | The LLM models used in the session. |
[Agent] Has Negative Feedback | boolean | Whether the user expressed dissatisfaction during the session. |
[Agent] Has Technical Failure | boolean | Whether technical errors occurred, such as tool timeouts or API failures. |
[Agent] Technical Error Count | number | Count of technical errors in the session. |
[Agent] Has Data Quality Issues | boolean | Whether the AI output had data quality problems, such as wrong data or hallucinations. |
[Agent] Root Agent Name | string | The entry-point agent in a multi-agent flow. |
[Agent] Agent Chain Depth | number | Number of agents in the delegation chain. |
Evaluator Result
[Agent] Evaluator Result lands once per evaluator per session. It is the unified event for every server-side evaluation: signal detectors, topic classifiers, and rubric scorers.
| Property | Type | Description |
|---|---|---|
[Agent] Evaluator Name | string | The evaluator that produced this result. |
[Agent] Output Type | string | The result shape: binary, classification, or score. |
[Agent] Detected | boolean | For binary evaluators, whether the condition was detected. |
[Agent] Primary Label | string | For classification evaluators, the primary label assigned. |
[Agent] Selection Mode | string | Whether the topic model assigns a single label (MECE) or multiple. |
[Agent] Topic | string | Which topic model the classification is for, such as product_area. |
[Agent] Rationale | string | The model's explanation for the result. |
[Agent] Evidence | string | Supporting evidence the model cited. |
[Agent] Evaluator Model | string | The LLM that ran the evaluation. |
[Agent] Evaluation Source | string | Where the evaluation came from: ai, user, or reviewer. |
[Agent] Taxonomy Version | string | The taxonomy config version that produced this result. |
[Agent] Evaluated At | number | Epoch milliseconds when the result was computed. |
User feedback scores
[Agent] Score records explicit user feedback, such as a thumbs up or down on a response. Scores come from your application through the SDK's score() method, not from the enrichment pipeline. To send scores, go to Send user feedback (scores).Signals
Signals are the default, always-on evaluators that Amplitude runs on every closed session. They land as [Agent] Evaluator Result events. You don't configure them, and Amplitude refines them over time, so treat them as directional.
| Signal | Output | What it measures |
|---|---|---|
| Task completion | binary | Whether the agent completed the user's task. |
| Response quality | score (0.0 to 1.0) | Whether responses were accurate and well-formed. |
| User friction | binary | Whether the user expressed dissatisfaction. |
| User intent | classification | The user's intent for the session. |
| Session safety | classification | Classifies the session as normal, off_topic, prompt_injection, abuse, or probing. |
| Data quality | code-based check | Flags wrong data or hallucination patterns. Returns a clear result rather than a generated rationale. |
Topics and custom evaluators
Beyond the default signals, you define your own topic models and evaluators. The enrichment taxonomy is fully configurable: topic model names such asquery_intent and product_area, and evaluator names such as task_completion, come from configuration and differ per project. To create and refine your own evaluators, go to Create and refine custom evaluators.Deprecated events
[Agent] Topic Classification is deprecated. Topic classifications now land as [Agent] Evaluator Result events with an output type of classification. Rubric scores also moved off [Agent] Score onto [Agent] Evaluator Result with an output type of score. [Agent] Score now carries user feedback only.
Was this helpful?