Agent Analytics SDK

Early Access

This feature is in Early Access. During this time, aspects of the functionality may still be developed, and this documentation may not always be up to date. If you have any questions, contact Amplitude Support.

This page is the developer reference for the Amplitude AI SDK. For the product-level setup overview, refer to Set up Agent Analytics. For the product concepts and how Amplitude uses the data, refer to the Agent Analytics overview and Analyze agent results.

The timeline below shows what your instrumentation produces. Click any event to inspect its shape and the call that emits it.

Event type

[Agent] AI Responsefrom the SDK

Fired at

22:33:48

Identity

[Agent] Session ID	4ddcc6b2-1041-432a-aa8c-ebe3eccac40b
[Agent] Agent ID	support-chatbot
[Agent] Trace ID	b4f63d43-d752-4b1f-8489-d234ddf586b2

Event-specific

$llm_message.text	I can help. Your subscription renews on Aug 15…
[Agent] Model	gpt-4o-mini
[Agent] Provider	openai
[Agent] Input Tokens	1245
[Agent] Output Tokens	87
[Agent] Latency Ms	3420
[Agent] Cost USD	0.0012

Closes the turn. Carries the eight fields the SDK doctor checks at setup: Session ID, Agent ID, Model, Provider, Latency Ms, Input/Output Tokens, Cost USD. Emitted by s.trackAiMessage(...) or a provider wrapper.

What you set and what you get

Instrumentation is a ladder, not an all-or-nothing setup. Each identifier you add unlocks another tier of analysis, so you can start with a one-line patch() and add context as you go.

You set	Where it comes from	What you unlock
API key	Amplitude project settings	Events reach Amplitude
User ID	Your auth layer (JWT, session cookie, API token)	Per-user analytics, cohorts, retention
Agent ID	Your choice, such as `chat-handler`	Per-agent cost, latency, and quality dashboards
Session ID	Your thread, ticket, call, or run ID. Refer to Instrument an agent session.	Multi-turn analysis, session enrichment, quality scores
Description	Optional. Your choice, such as `Handles support queries via GPT-4o`.	Human-readable agent registry from event streams
Content mode and PII redaction	Automatic. The config defaults work.	Server enrichment and PII scrubbing
Model, tokens, cost	Automatic. The provider wrappers capture them.	Cost analytics, latency monitoring
Parent agent ID	Automatic through `child()` and `runAs()`	Multi-agent hierarchy
Environment, agent version, context	Your deploy pipeline	Segmentation, regression detection

The minimum viable setup is four fields: API key, user ID, agent ID, and session ID. Everything else is automatic or a progressive enhancement. If your user and session IDs are anonymous today, instrument anyway: the events still flow, and you can wire real identity later.

What you get at each level

The coding agent workflow defaults to full instrumentation, the top row below. The lower levels are fallbacks and verification steps, not recommended end states.

Level	Events you get	What it unlocks in Amplitude
Full (agents + sessions + wrappers)	User Message, AI Response, Tool Call, Session End, Score, and the server enrichment events	Per-user funnels, cohorts, retention, session replay linking, quality scoring
Wrappers only (no sessions)	AI Response with cost, tokens, and latency	Aggregate cost monitoring, model comparison
`patch()` only (no wrappers, no sessions)	AI Response (basic)	Aggregate call counts, useful for verification only

Prerequisites

An Amplitude project with Agent Analytics enabled.
The View Agent Analytics Objects permission. Admins grant access through role-based access control (RBAC).
An agent codebase to instrument in Node.js or Python (or a runtime that can call the Amplitude HTTP API).
The project's API key for the right data center. Agent Analytics runs in US and EU.

Install the SDK

bash

npm install @amplitude/ai @amplitude/analytics-node

To let an AI coding agent wire up the SDK, run this and paste the printed prompt into Cursor, Claude Code, Windsurf, GitHub Copilot, or Codex:

bash

npx amplitude-ai

The agent scans your codebase, identifies every LLM call site and the session lifecycle, then instruments them.

Initialize the SDK

Initialize once at your application entry point and reuse the instance. The recommended pattern is a bootstrap module that exports ai plus wrapped provider clients.

typescript

// src/lib/amplitude.ts
import { AmplitudeAI, AIConfig, OpenAI } from "@amplitude/ai";

export const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({ contentMode: "full", redactPii: true }),
});

export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
  amplitude: ai,
});

Import openai from this module instead of directly from 'openai'. Add more wrapped providers as needed.

To validate events during development without sending them, set dryRun: true (Node) or dry_run=True (Python) on AIConfig.

Configure the SDK

Pass an AIConfig to the AmplitudeAI constructor. All options are optional; the defaults work for most apps.

Option	Description
`contentMode` / `content_mode`	`'full'` (default), `'metadata_only'`, or `'customer_enriched'`. Controls what message content reaches Amplitude. Refer to Choose a privacy mode.
`redactPii` / `redact_pii`	Scrub emails, phone numbers, SSNs, credit card numbers, and IP addresses from tracked content before events leave the process. Defaults to `true`. Set to `false` to opt out.
`customRedactionPatterns` / `custom_redaction_patterns`	Additional redaction patterns. Accepts regex strings (replaced with `[REDACTED]`) or `{ pattern, replacement }` objects for named labels.
`customRedactionFn` / `custom_redaction_fn`	A `(text) => string` callback for custom redaction logic (for example, an NER library). Runs after all regex-based redaction.
`debug`	Log every tracked event to stderr.
`dryRun` / `dry_run`	Build and log events without sending them to Amplitude. Use during development.
`validate`	Enforce strict validation of required fields.
`onEventCallback` / `on_event_callback`	A `(event, statusCode, message) => void` callback invoked exactly once per tracked event, from the delivery path.
`propagateContext` / `propagate_context`	Enable cross-service context propagation. Refer to Propagate context across services.

For redaction recipes (named replacements, custom scrubbers, international locales), refer to Choose a privacy mode.

Instrument an agent session

Wrap each agent invocation in a session. The session correlates every event (user message, model response, tool calls, spans) into a single record.

An agent session is one job the user hands the agent, from start to finish: the unit of work with a real outcome. Set the sessionId from an ID you already track, rather than inventing a new one:

Chatbot or copilot: the conversation thread ID.
Coding agent: the task or work-session ID.
Support agent: the ticket ID.
Voice agent: the call ID.
Background or autonomous agent: the run or job ID.

An agent session isn't Amplitude's standard-analytics session. The agent session, [Agent] Session ID, is one job the user hands the agent. Amplitude's standard-analytics session, $session_id, is the user's app or web visit that powers Session Replay and product reports. Set the agent session from your own ID, and forward the standard-analytics session ID across the network boundary if you want to link the two.

typescript

import { ai, openai } from "@/lib/amplitude";

const agent = ai.agent("chat-handler", {
  description: "Customer support chatbot",
});

export async function POST(req: Request) {
  const { messages, userId } = await req.json();
  return agent.session({ userId }).run(async (s) => {
    s.trackUserMessage(messages[messages.length - 1].content);
    const response = await openai.chat.completions.create({
      model: "gpt-4o-mini",
      messages,
    });
    return Response.json(response);
  });
}

The Python SDK follows the same pattern with ai.agent(...).session(...). A session opened with run() closes when the callback returns. For sessions that span multiple requests, end them one of these ways:

Close it explicitly (recommended): Call trackSessionEnd() (Node) or track_session_end() (Python) when the job finishes, such as a closed ticket or a completed run. Server-side evaluation runs immediately.
Let the idle timeout close it: The timeout defaults to 30 minutes from the first user message, configurable per session with idleTimeoutMinutes (Node) or idle_timeout_minutes (Python). Raise it for jobs with long natural gaps, such as 240 for a support ticket worked over hours. Set it to -1 to disable the idle close, which keeps the session open until you end it explicitly, with a 90-day backstop.

When the same user returns with a new goal, start a new session with a new sessionId rather than continuing the old one.

Minimum viable instrumentation

Agent Analytics needs four fields to correlate events: the API key, a user identifier (userId or deviceId), an agentId, and a sessionId. The recommended pattern automatically adds provider wrappers so the SDK captures model, token, cost, and latency data.

Two identity rules keep a single user from splitting into two:

Don't pass a placeholder userId such as "anonymous", "", or a temporary ID. Omit the userId instead. Amplitude can't change a userId after it's set, so a placeholder creates a separate user that won't merge later.
Reuse the same deviceId across a pre-account session. If your backend generates a new deviceId per request, the merge breaks. Read the deviceId from the Browser SDK and forward it.

Auto-instrument provider calls

The SDK offers two zero-code paths for capturing provider activity.

Provider wrappers

Wrap the provider client at construction time. The wrapper forwards calls to the underlying client and records request, response, tokens, latency, and cost.

typescript

import OpenAI from "openai";
const openaiWrapped = new OpenAI({ amplitude: ai });

Provider	Wrapper
OpenAI (Chat Completions + Responses)	`new OpenAI({ apiKey, amplitude: ai })`
Anthropic	`new Anthropic({ apiKey, amplitude: ai })`
Azure OpenAI	`new AzureOpenAI({ apiKey, amplitude: ai })`
Gemini (`@google/generative-ai`)	`new Gemini({ apiKey, amplitude: ai })`
Google Gen AI (`@google/genai`)	`new GoogleGenAI({ apiKey, amplitude: ai })`
Bedrock (Converse APIs)	`new Bedrock({ amplitude: ai, client })`
Mistral	`new Mistral({ apiKey, amplitude: ai })`

If you can't change the construction site, use wrap(existingClient, ai) to instrument an existing client without modifying its creation.

Coverage varies by provider. All wrappers capture streaming, system prompts, and cost; the rest depends on what the provider's API exposes:

Feature	OpenAI	Anthropic	Gemini	Azure OpenAI	Bedrock	Mistral
Streaming	Yes	Yes	Yes	Yes	Yes	Yes
Tool-call tracking	Yes	Yes	No	Yes	Yes	No
TTFB measurement	Yes	Yes	No	Yes	No	No
Cache token stats	Yes	Yes	No	No	No	No
Responses API	Yes	—	—	—	—	—
Reasoning content	Yes	Yes	No	Yes	No	No
Cost estimation	Yes	Yes	Yes	Yes	Yes	Yes

Bedrock model IDs such as us.anthropic.claude-3-5-sonnet are normalized for price lookup automatically.

`patch()`

Call patch({ amplitudeAI: ai }) once at startup for zero-code instrumentation. The SDK monkey-patches supported clients and auto-extracts [Agent] Tool Call events from message arrays for OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages. Extracted tool calls land with latencyMs: 0 because execution timing isn't available through message inspection. Use tool() or trackToolCall() when you need real tool latency.

Track tools

The tool() higher-order function wraps a tool function so the SDK records each call:

typescript

import { tool } from "@amplitude/ai";

const searchProducts = tool(searchDB, { name: "search_products" });

// Inside session.run, call as usual:
const result = await searchProducts(query);
// [Agent] Tool Call event emitted with duration, success, input/output

For inline tool calls or unsupported flows, use s.trackToolCall(name, latencyMs, success, { input, output }) directly.

Track spans

Spans wrap internal sub-operations such as vector lookups, reranks, guardrails, or any timed work that sits inside a turn. They emit [Agent] Span events and share the trace's identity.

OTEL-enabled behavior

When OTEL is enabled through enable_otel() / enableOtel(), observe() / @observe creates real OTEL spans instead of emitting events directly. The SpanEventMapper converts these spans into the appropriate [Agent] event type. Use the type parameter to control routing: @observe(type="tool") routes the span as [Agent] Tool Call rather than [Agent] Span.

typescript

import { observe } from "@amplitude/ai";

// As a higher-order function:
const runSubAgent = observe(
  async (prompt: string) => {
    return await subAgent.execute(prompt);
  },
  { name: "sub-agent-execution" },
);

// Or explicitly when you need error capture:
const start = Date.now();
try {
  const result = await subAgent.execute(prompt);
  s.trackSpan({
    name: "sub-agent-execution",
    latencyMs: Date.now() - start,
    inputState: { prompt: prompt.slice(0, 1000) },
    outputState: { response: result.slice(0, 1000) },
  });
} catch (e) {
  s.trackSpan({
    name: "sub-agent-execution",
    latencyMs: Date.now() - start,
    isError: true,
    errorType: (e as Error).name,
    errorMessage: (e as Error).message,
  });
  throw e;
}

Spans don't replace turn-level events

Agent Analytics turn counts and interaction views are driven by [Agent] User Message and [Agent] AI Response, not spans. If you only emit spans around internal steps, dashboards show traces with no turn-level analytics. Always emit the User Message / AI Response pair for each user-visible cycle, and use spans on top.

Manual instrumentation

For custom flows or unsupported providers, use the manual methods on the session object directly. Each maps to a single [Agent] event type.

Method	Event	Use when
`s.trackUserMessage(text)`	`[Agent] User Message`	User-authored input arrives
`s.trackAiMessage(text, model, provider, latencyMs, opts?)`	`[Agent] AI Response`	Provider wrapper can't auto-capture
`s.trackToolCall(name, latencyMs, success, opts?)`	`[Agent] Tool Call`	Calling a tool outside `tool()`
`s.trackSpan({ name, latencyMs, ... })`	`[Agent] Span`	Wrapping an internal sub-step
`s.runAs(childAgent, fn)`	(delegation)	Routing to a child agent

For AI responses that don't go through a wrapper (proxies, custom gateways), pass usage from the completion response:

typescript

s.trackAiMessage(completedMessage.content, "gpt-4o", "openai", latencyMs, {
  inputTokens: usage.prompt_tokens,
  outputTokens: usage.completion_tokens,
  totalTokens: usage.total_tokens,
});

Pass the canonical provider model id (gpt-4o-mini, claude-sonnet-4-20250514), not an internal gateway label, so cost auto-calculates correctly.

Add segmentation with context

Pass a context dictionary to ai.agent(...) to attach arbitrary segmentation dimensions to every event. The SDK serializes it to [Agent] Context, so you can segment AI sessions without registering new global properties.

const agent = ai.agent("support-bot", {
  context: {
    agent_type: "executor",
    experiment_variant: "reasoning-enabled",
    surface: "chat",
  },
});

These keys cover the most common segmentation needs:

Key	Example values	Use case
`agent_type`	`"planner"`, `"executor"`, `"retriever"`, `"router"`	Group analytics by agent role in multi-agent systems.
`experiment_variant`	`"control"`, `"treatment-v2"`	Compare quality, abandonment, or cost across A/B test arms.
`feature_flag`	`"new-rag-pipeline"`	Track which flags were active during the session.
`surface`	`"chat"`, `"search"`, `"copilot"`	Identify the UI surface that triggered the interaction.
`prompt_revision`	`"v7"`, `"2026-02-15"`	Track the prompt version; detect regressions alongside `agentVersion`.
`deployment_region`	`"us-east-1"`, `"eu-west-1"`	Segment by region for latency or compliance analysis.
`canary_group`	`"canary"`, `"stable"`	Separate canary from stable deployments during a rollout.

Merge context across child agents

Child agents inherit the parent's context. Keys on the child override matching parent keys; parent keys the child doesn't set are preserved.

const parent = ai.agent("orchestrator", {
  context: { experiment_variant: "treatment", surface: "chat" },
});
const child = parent.child("researcher", {
  context: { agent_type: "retriever" },
});
// child context = { experiment_variant: "treatment", surface: "chat", agent_type: "retriever" }

Query context in Amplitude

[Agent] Context is a JSON string. To query individual keys:

Derived properties: For frequently-used keys, create a derived event property (Data > Properties > Derived > New) that extracts the value permanently.
Filters: Use [Agent] Context contains "key":"value" for string matching in chart filters.

Use multiple tenants

On a multi-tenant platform, create a tenant-scoped handle with ai.tenant(orgId, opts?) (Node) or ai.tenant(org_id, ...) (Python). Every agent created from the handle pre-binds customerOrgId, which lands as [Agent] Customer Org ID on each event, so you can segment usage by end customer without threading the org ID through every call.

const tenant = ai.tenant("org-456", { env: "production" });
const agent = tenant.agent("support-bot", { userId: "user-123" });
// agent.track* calls carry [Agent] Customer Org ID = "org-456"

Classify model tiers

The SDK infers a model tier from the model name and attaches it as [Agent] Model Tier on every [Agent] AI Response. Tiers let you compare cost and performance across model classes without listing every model.

Tier	Examples	When to use
`fast`	`gpt-4o-mini`, `claude-3-haiku`, `gemini-flash`, `gpt-3.5-turbo`	High-volume, latency-sensitive work.
`standard`	`gpt-4o`, `claude-3.5-sonnet`, `gemini-pro`, `llama`, `command`	General purpose.
`reasoning`	`o1`, `o3-mini`, `deepseek-r1`, Claude with extended thinking	Complex reasoning tasks.

Call inferModelTier() / infer_model_tier() to resolve a tier directly:

import { inferModelTier } from "@amplitude/ai";

inferModelTier("gpt-4o-mini"); // 'fast'
inferModelTier("claude-3.5-sonnet"); // 'standard'
inferModelTier("o1-preview"); // 'reasoning'

For custom or fine-tuned models the name can't classify, pass modelTier / model_tier on the AI-message call to override the inferred value:

s.trackAiMessage(response.content, "ft:gpt-4o:my-org:custom", "openai", latencyMs, {
  modelTier: "standard",
});

Track attachments

Pass an attachments array to the user-message call to record files sent with a message (images, PDFs, URLs). Each entry carries type, name, and size_bytes.

s.trackUserMessage("Analyze this document", {
  attachments: [
    { type: "image", name: "chart.png", size_bytes: 102400 },
    { type: "pdf", name: "report.pdf", size_bytes: 2048576 },
  ],
});

The SDK derives these properties from the array, recording attachment metadata only, never file content: [Agent] Has Attachments, [Agent] Attachment Types, [Agent] Attachment Count, [Agent] Total Attachment Size Bytes, and [Agent] Attachments. Attachments also apply to AI responses, such as model-generated images. Pass the same attachments option to the AI-message call.

Capture implicit feedback

Behavioral signals indicate whether a response met the user's need without requiring an explicit rating. Set these options on the relevant track calls; the SDK maps them to queryable quality properties.

Signal	Property	Interpretation
Copy	`[Agent] Was Copied`	User copied the output, a positive signal. Set on the AI-message call.
Regeneration	`[Agent] Is Regeneration`	User asked for a redo, a negative signal. Set on the user-message call.
Edit	`[Agent] Is Edit` + `[Agent] Edited Message ID`	User refined a previous prompt, a friction signal. Set on the user-message call.
Abandonment	`[Agent] Abandonment Turn`	User left after N turns; a low value (such as `1`) signals first-response dissatisfaction. Set on session end.

// AI response the user copied (positive)
s.trackAiMessage("To create a funnel, go to...", "gpt-4o", "openai", latencyMs, { wasCopied: true });

// User regenerates (negative — first response fell short)
s.trackUserMessage("How do I create a funnel?", { isRegeneration: true });

// User edits and resubmits their prompt
s.trackUserMessage("How do I create a conversion funnel for signups?", {
  isEdit: true,
  editedMessageId: originalMsgId,
});

// User left after the first AI response
agent.trackSessionEnd({ sessionId: "sess-1", abandonmentTurn: 1 });

Import existing conversations

To backfill a full message history in one call, use trackConversation() (Node) or track_conversation() (Python). Pass an array of { role, content } messages; each becomes a [Agent] User Message or [Agent] AI Response, with turn IDs auto-incremented in order. system messages are skipped.

import { trackConversation } from "@amplitude/ai";
import * as amplitude from "@amplitude/analytics-node";

trackConversation({
  amplitude,
  userId: "user-123",
  sessionId: "sess-abc",
  agentId: "support-bot",
  messages: [
    { role: "user", content: "How do I reset my password?" },
    {
      role: "assistant",
      content: "Go to Settings > Security > Reset Password.",
      model: "gpt-4o",
      provider: "openai",
      latency_ms: 1200,
      input_tokens: 15,
      output_tokens: 42,
    },
    { role: "user", content: "Thanks, that worked!" },
  ],
});

Use this to import historical conversations or migrate data from external systems. The function accepts the same context fields as the individual tracking methods.

Send user feedback (scores)

Capture explicit user feedback, such as a thumbs up or down on a response or an optional rating, as a [Agent] Score event. Scores come only from your application; Amplitude's enrichment pipeline never generates them.

// Thumbs up/down on a specific AI response
ai.score({
  userId: "user-123",
  name: "user-feedback",
  value: 1.0,
  targetId: aiMessageId,
  targetType: "message",
  source: "user",
});

If you ingest events directly instead of using the SDK, send an [Agent] Score event with [Agent] Score Name set to your score name (for example, user-feedback).

Multi-agent architectures

Parent agents can delegate to child agents. Child agents inherit the parent's session, so all events stay correlated under one Session ID.

typescript

const orchestrator = ai.agent('shopping-agent', { description: 'Orchestrates shopping requests' });
const recipeAgent = orchestrator.child('recipe-agent', { description: 'Finds recipes' });

await orchestrator.session({ userId }).run(async (s) => {
  s.trackUserMessage(userInput);

  const result = await s.runAs(recipeAgent, async (cs) => {
    cs.trackUserMessage(delegatedQuery);
    return openai.chat.completions.create({ model: 'gpt-4o', messages: [...] });
  });
});

Wrap delegation calls with observe() or trackSpan if you want latency and error metrics on the dispatch itself, not only the child's LLM call.

Integration patterns

Single-request API endpoint

For a serverless function or one-shot endpoint, create the session inside the handler and flush before returning so the runtime doesn't freeze before events ship.

app.post("/chat", async (req, res) => {
  const agent = ai.agent("api-handler", { userId: req.userId });
  const result = await agent.session({ sessionId: req.sessionId }).run(async (s) => {
    s.trackUserMessage(req.body.message);
    const start = performance.now();
    const response = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: req.body.messages,
    });
    s.trackAiMessage(
      response.choices[0].message.content ?? "",
      "gpt-4o",
      "openai",
      performance.now() - start,
      {
        inputTokens: response.usage?.prompt_tokens,
        outputTokens: response.usage?.completion_tokens,
      },
    );
    return response.choices[0].message.content;
  });
  await ai.flush();
  res.json({ response: result });
});

Long-lived session (chatbot)

For a multi-turn conversation, create the session once and reuse it across turns. Track a user/AI pair per turn; the session ends when run() returns or the idle timeout fires.

const agent = ai.agent("chatbot", { userId: "user-123", env: "production" });

await agent.session({ sessionId: conversationId }).run(async (s) => {
  s.trackUserMessage("What is Amplitude?");
  const r1 = await llm.chat("What is Amplitude?");
  s.trackAiMessage(r1.content, "gpt-4o", "openai", r1.latencyMs, {
    inputTokens: r1.usage.input,
    outputTokens: r1.usage.output,
  });

  s.trackUserMessage("How does it track events?");
  const r2 = await llm.chat("How does it track events?");
  s.trackAiMessage(r2.content, "gpt-4o", "openai", r2.latencyMs, {
    inputTokens: r2.usage.input,
    outputTokens: r2.usage.output,
  });
});

Multi-agent orchestration

When a parent agent delegates to specialized children, wrap each delegation in runAs() / arun_as(). Both manual tracking calls and provider wrappers inside the callback pick up the child's identity automatically. For the basic delegation shape, refer to Multi-agent architectures.

How runAs / arun_as works:

Shares the parent session's sessionId, traceId, and turn counter.
Sets agentId to the child and parentAgentId to the parent for the callback's duration.
Suppresses auto user-message tracking, so internal role: "user" prompts in delegation calls don't create spurious user turns.
Doesn't emit [Agent] Session End; the child runs inside the parent session, which emits one session end.
Restores the parent context when the callback completes, even on error.
Supports nesting: a child can runAs a grandchild.

Fan-out (parallel child calls, single user turn)

When one user turn triggers several parallel LLM calls, open a fresh trace with newTrace() / new_trace(), dispatch the children with Promise.all (Node) or asyncio.gather (Python), and emit a single AI response after they join. This keeps one trace, one user turn, and one AI response regardless of how many internal calls run.

await orchestrator.session({ sessionId }).run(async (s) => {
  s.newTrace();
  s.trackUserMessage("Generate plan from quiz results", { context: structuredState });

  const [a, b] = await Promise.all([
    s.runAs(scorer, () =>
      openai.chat.completions.create({ model: "gpt-4o", messages: scorerMessages }),
    ),
    s.runAs(matcher, () =>
      openai.chat.completions.create({ model: "gpt-4o", messages: matcherMessages }),
    ),
  ]);

  s.trackAiMessage(assemble(a, b), "gpt-4o", "openai", totalLatencyMs);
});

Stream responses

Streaming sessions must stay open until the stream is fully consumed. Closing the session before the stream finishes drops the AI response event.

typescript

// WRONG: session ends before stream is consumed
return agent.session({ userId }).run(async (s) => {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });
  return new Response(stream.toReadableStream());
});

// CORRECT: session stays open until stream completes
return agent.session({ userId }).run(async (s) => {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    messages,
    stream: true,
  });
  const readable = stream.toReadableStream();
  const [passthrough, forClient] = readable.tee();
  const reader = passthrough.getReader();
  (async () => {
    while (!(await reader.read()).done) {}
  })();
  return new Response(forClient);
});

With the Vercel AI SDK, flush in the onFinish callback:

typescript

const result = await streamText({
  model: openai("gpt-4o"),
  messages,
  onFinish: async () => {
    await ai.flush();
  },
});

Link to the standard-analytics session

When a session crosses the network boundary, pass Amplitude IDs through request headers so server-side events join the user's standard-analytics session ($session_id). Pass the value as the session's browserSessionId field:

typescript

const browserSessionId = req.headers.get("x-amplitude-session-id");
const deviceId = req.headers.get("x-amplitude-device-id");
const session = agent.session({ userId, browserSessionId, deviceId });

For cross-service propagation between back-end services, use injectContext() on the outbound side and extractContext(headers) on the inbound side.

Propagate context across services

When one back-end service calls another, propagate the active identity and session so the downstream events join the same trace instead of starting a new one. On the outbound side, injectContext() serializes the active context (session ID, trace ID, user ID) into request headers. On the inbound side, extractContext(headers) reads them back.

// --- Service A (outbound) ---
import { injectContext } from "@amplitude/ai";

await agent.session({ userId, sessionId }).run(async (s) => {
  s.trackUserMessage(message);
  const headers = injectContext({ "content-type": "application/json" });
  await fetch("https://service-b/internal/enrich", {
    method: "POST",
    headers,
    body: JSON.stringify({ message }),
  });
});

// --- Service B (inbound) ---
import { randomUUID } from "node:crypto";
import { extractContext, runWithContextAsync, SessionContext } from "@amplitude/ai";

export async function POST(req: Request) {
  const extracted = extractContext(Object.fromEntries(req.headers));
  const ctx = new SessionContext({
    sessionId: extracted.sessionId ?? randomUUID(),
    traceId: extracted.traceId ?? null,
    userId: extracted.userId ?? null,
  });
  return runWithContextAsync(ctx, async () => {
    await handleEnrichment(req);
  });
}

injectContext() returns a new headers object and never mutates the original. If no session is active, it returns the headers unchanged, so it's safe to call unconditionally.

Supported providers and frameworks

Providers with native wrappers: OpenAI (Chat Completions + Responses), Anthropic, Azure OpenAI, Gemini (@google/generative-ai), Google Gen AI (@google/genai), Mistral, Bedrock (Converse APIs).

Agent frameworks with first-party integrations: LangChain, LlamaIndex, OpenAI Agents SDK, Anthropic Tool Use, Claude Agent SDK (ClaudeAgentSDKTracker), Anthropic Managed Agents, CrewAI (Python only).

Framework integrations

The integrations below bridge an agent framework's own callback or tracing system into Agent Analytics. Each takes the ai instance plus identity fields, then hooks into the framework. CrewAI is Python-only; in Node, AmplitudeCrewAIHooks throws by design. Use the LangChain or OpenTelemetry path instead.

LangChain

Pass an AmplitudeCallbackHandler to LangChain's callbacks.

import { AmplitudeCallbackHandler } from "@amplitude/ai";

const handler = new AmplitudeCallbackHandler({ amplitudeAI: ai, userId: "user-123", sessionId: "sess-1" });
// Pass handler to any LangChain runnable via { callbacks: [handler] }

LlamaIndex

import { AmplitudeLlamaIndexHandler } from "@amplitude/ai";

const handler = new AmplitudeLlamaIndexHandler({ amplitudeAI: ai, userId: "user-123", sessionId: "sess-1" });

OpenAI Agents SDK

import { AmplitudeTracingProcessor } from "@amplitude/ai";

const processor = new AmplitudeTracingProcessor({ amplitudeAI: ai, userId: "user-123", sessionId: "sess-1" });
// Register with the OpenAI Agents SDK trace provider.

Anthropic Tool Use

AmplitudeToolLoop runs Anthropic's multi-turn tool_use loop and tracks each AI response and tool call.

import { AmplitudeToolLoop } from "@amplitude/ai";

const loop = new AmplitudeToolLoop({ amplitudeAI: ai, userId: "user-123", sessionId: "sess-1" });
await loop.run({ client, model: "claude-sonnet-4-20250514", messages, tools, toolExecutor });

OpenTelemetry attribute mapping

If a framework already emits OpenTelemetry GenAI spans, the SDK maps them onto [Agent] properties. For how to enable this, including the span-first enableOtel() / enable_otel() path and the manual AmplitudeGenAIExporter / AmplitudeAgentExporter exporters, refer to Ingest OpenTelemetry spans. The mapping the exporter applies:

OTEL span attribute	`[Agent]` property	Notes
`gen_ai.response.model` / `gen_ai.request.model`	`[Agent] Model Name`	Response model preferred.
`gen_ai.system` / `gen_ai.provider.name`	`[Agent] Provider`	Required; spans without it are ignored.
`gen_ai.usage.input_tokens`	`[Agent] Input Tokens`
`gen_ai.usage.output_tokens`	`[Agent] Output Tokens`
`gen_ai.usage.total_tokens`	`[Agent] Total Tokens`	Derived from input + output if absent.
`gen_ai.request.temperature`	`[Agent] Temperature`
`gen_ai.request.top_p`	`[Agent] Top P`
`gen_ai.request.max_tokens`	`[Agent] Max Output Tokens`
`gen_ai.response.finish_reasons`	`[Agent] Finish Reason`	First reason if an array.
`gen_ai.tool.name`	`[Agent] Tool Name`	Routes the span as `[Agent] Tool Call`.
`gen_ai.input.messages`	`$llm_message`	User-role messages only, and only if the privacy mode allows.
Span duration	`[Agent] Latency Ms`
Span status `ERROR`	`[Agent] Is Error`, `[Agent] Error Message`

Some signals have no OTEL equivalent and require the native provider wrappers: reasoning content and tokens, TTFB, streaming detection, implicit feedback, file attachments, and event-graph linking through [Agent] Parent Message ID. You can run OTEL and a native wrapper together for the same call. The SDK de-duplicates, so no double events emit.

Provider-specific notes

Vercel AI SDK

Provider wrappers instrument the underlying SDK (openai), not the Vercel abstraction. If only @ai-sdk/openai is present, either add openai as a direct dependency or fall back to patch(). For streaming responses, use onFinish to call await ai.flush() (refer to Stream responses).

Claude Agent SDK

Use ClaudeAgentSDKTracker from @amplitude/ai/integrations/claude-agent-sdk. Two fields are required for the events to be useful: agentId on ai.agent() (identifies the AI feature in the LLM Usage Application Registry), and userId + sessionId on agent.session() (ties events into a single interaction).

typescript

import { AmplitudeAI } from "@amplitude/ai";
import { ClaudeAgentSDKTracker } from "@amplitude/ai/integrations/claude-agent-sdk";
import { query } from "@anthropic-ai/claude-agent-sdk";

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent({ agentId: "code-reviewer" });
const tracker = new ClaudeAgentSDKTracker();

await agent.session({ userId: "u1", sessionId: "sess-abc" }).run(async (s) => {
  for await (const message of query({
    prompt: "Analyze this codebase",
    options: { hooks: tracker.hooks(s) },
  })) {
    tracker.process(s, message);
  }
});

tracker.hooks(session) returns PreToolUse / PostToolUse hooks with precise tool latency. tracker.process(session, message) processes the message stream for AI responses and user messages.

Anthropic Managed Agents

Provider wrappers don't work, because LLM calls happen in Anthropic's cloud, not your code. Use manual tracking and poll client.beta.sessions.events.list(). Map event types to SDK methods:

Anthropic event	SDK call
`user.message`	`trackUserMessage(text)` (track when sending, not when polling)
`agent.message`	`trackAiMessage(text, model, 'anthropic', latencyMs)`
`agent.tool_use` / `agent.mcp_tool_use` / `agent.custom_tool_use`	`trackToolCall(name, latencyMs, success)`
`agent.tool_result` / `agent.mcp_tool_result`	skip (latency captured at `tool_use` time)
`session.error`	`trackAiMessage(errorMsg, model, 'anthropic', latencyMs, { isError: true })`

Deduplicate events across polls, because events.list() returns previously-seen events:

typescript

const seenIds = new Set<string>(savedState.seenIds);
for (const event of response.data) {
  if (seenIds.has(event.id)) continue;
  seenIds.add(event.id);
  // track event
}

Measure latency as wall-clock time between session.status_running and the event's processed_at, not poll round-trip. events.list() doesn't include usage or token counts, so cost tracking requires the Anthropic Admin API.

OpenAI Assistants API

Provider wrappers don't auto-instrument the Assistants API (async / polling-based). Use manual tracking: trackUserMessage() when creating a message, trackAiMessage() when polling completion events.

MCP servers

The MCP protocol doesn't pass the originating user prompt to tools, so MCP servers can't capture it. Add an optional rationale parameter to each tool so the LLM can self-explain its intent and you keep usable session content.

Framework notes

Next.js (App Router)

Initialize the SDK in a server-side module, never a client component. Add @amplitude/ai to serverExternalPackages in next.config.ts. Wrap session creation inside each route handler; in serverless deployments call await ai.flush() before the handler returns so the runtime doesn't freeze before events ship.

Express / Fastify / Hono

Use the bundled middleware to attach ai to every request:

typescript

import { createAmplitudeAIMiddleware } from "@amplitude/ai";

app.use(
  createAmplitudeAIMiddleware({
    amplitudeAI: ai,
    userIdResolver: (req) => req.headers["x-user-id"] ?? null,
  }),
);

Run in serverless environments

The SDK auto-detects serverless platforms (Vercel, AWS Lambda, Netlify, Google Cloud Functions, Azure Functions, Cloudflare Pages) from their environment variables. When it detects one, session.run() flushes pending events before the promise resolves, so you don't need an explicit ai.flush(). In a long-running server, it skips the per-session flush and lets the analytics client batch normally.

Control this per session with the autoFlush option (Node) or auto_flush (Python). Leave it unset to use auto-detection, set it to true to always flush on session exit, or false to never flush.

If you track events outside session.run(), flush before the handler returns or the runtime can freeze the process with events still buffered.

ai.flush() and ai.shutdown() serve different lifecycles:

ai.flush() sends buffered events now and keeps the SDK running. Use it in serverless handlers and API endpoints to guarantee delivery before responding.
ai.shutdown() flushes and then closes the underlying analytics client. Call it once on process exit, such as a SIGTERM handler. It only closes the client when you created it through apiKey; if you passed your own instance, you own its lifecycle.

typescript

process.on("SIGTERM", () => {
  ai.shutdown();
  process.exit(0);
});

Cloudflare Workers (edge isolates) aren't a supported serverless target; the full SDK can't bundle into a Worker. Refer to Edge runtimes and Cloudflare Workers for the fetch-based transport.

Edge runtimes and Cloudflare Workers

@amplitude/ai cannot bundle in Cloudflare Workers

The SDK depends on node:async_hooks, node:module, and node:crypto. Workers Builds rejects the upload even with nodejs_compat_v2 enabled. @amplitude/analytics-node is also incompatible (depends on Node's http).

The only safe import is import type { ... } from '@amplitude/ai/types', which is erased at compile time. For runtime tracking, use a fetch-based transport that constructs [Agent] events directly:

typescript

import type { AmplitudeClientLike, AmplitudeEvent } from "@amplitude/ai/types";

class FetchAmplitudeClient implements AmplitudeClientLike {
  private _apiKey: string;
  private _buffer: AmplitudeEvent[] = [];

  constructor(apiKey: string) {
    this._apiKey = apiKey;
  }

  track(event: AmplitudeEvent): void {
    this._buffer.push(event);
  }

  async flush(): Promise<void> {
    if (!this._buffer.length) return;
    const events = this._buffer.splice(0);
    try {
      const resp = await fetch("https://api2.amplitude.com/2/httpapi", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ api_key: this._apiKey, events }),
      });
      if (!resp.ok) console.error(`[Amplitude] Flush failed: ${resp.status}`);
    } catch (err) {
      console.error(`[Amplitude] Flush error: ${(err as Error).message}`);
    }
  }
}

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    if (env.AMPLITUDE_TRACKING_DISABLED) return handleRequest(request, env);

    const transport = new FetchAmplitudeClient(env.AMPLITUDE_API_KEY);

    transport.track({
      event_type: "[Agent] User Message",
      user_id: userId,
      event_properties: {
        "[Agent] Session ID": sessionId,
        "[Agent] Agent ID": "my-agent",
        $llm_message: { text: content },
      },
    });

    // After the LLM call completes:
    transport.track({
      event_type: "[Agent] AI Response",
      user_id: userId,
      event_properties: {
        "[Agent] Session ID": sessionId,
        "[Agent] Agent ID": "my-agent",
        "[Agent] Model Name": model,
        "[Agent] Provider": "anthropic",
        "[Agent] Latency Ms": latencyMs,
        $llm_message: { text: responseText },
      },
    });

    // Non-blocking flush so events ship before the isolate terminates
    ctx.waitUntil(transport.flush());
    return new Response("ok");
  },
};

Construct FetchAmplitudeClient per-request to avoid buffer leakage between requests. Use crypto.randomUUID() for event insert_id dedup, and gate tracking behind an AMPLITUDE_TRACKING_DISABLED env var to disable it.

Ingest OpenTelemetry spans

The SDK supports two approaches for OpenTelemetry integration: the recommended span-first approach using enable_otel() / enableOtel(), and the manual exporter approach for existing OTel pipelines.

Install OTEL dependencies

bash

npm install @opentelemetry/api @opentelemetry/sdk-trace-base

Span-first approach (recommended)

Call enable_otel() / enableOtel() after initializing the SDK. This registers a SpanEventMapper that automatically converts OTEL spans into [Agent] events based on span attributes and semantic conventions.

typescript

import { AmplitudeAI, AIConfig } from "@amplitude/ai";

const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({ contentMode: "full" }),
});

ai.enableOtel();

How it works: When OTEL is enabled, the SDK registers as a span processor. Incoming OTEL spans pass through the SpanEventMapper, which routes them to the appropriate [Agent] event type based on span attributes:

gen_ai.* spans → [Agent] AI Response
Spans with tool.name → [Agent] Tool Call
All other spans → [Agent] Span

De-duplication: If you use both provider wrappers and OTEL instrumentation, the SDK marks wrapper-generated events with tracker_managed and skips incoming OTEL spans for the same operation. No duplicate events are emitted.

Custom function tracking with `@observe`

Use the @observe decorator (Python) or observe() higher-order function (Node) to create OTEL spans for your own functions. The type parameter controls which [Agent] event the span maps to:

typescript

import { observe } from "@amplitude/ai";

const runRetrieval = observe(
  async (query: string) => {
    const results = await vectorStore.search(query);
    return results;
  },
  { name: "vector-search", type: "tool" },
);

`type` value	Maps to
`"tool"`	`[Agent] Tool Call`
`"agent"`	`[Agent] AI Response`
`"llm"`	`[Agent] AI Response`
`"span"`	`[Agent] Span`

Context propagation

Use using_attributes() / usingAttributes() to attach identity and session context to OTEL spans created outside the SDK's session scope:

typescript

import { usingAttributes } from "@amplitude/ai";

await usingAttributes(
  { userId: "user-123", sessionId: "sess-abc", agentId: "my-agent" },
  async () => {
    await someOtelInstrumentedFunction();
  },
);

To update attributes on the current span directly, use update_current_span() / updateCurrentSpan():

typescript

import { updateCurrentSpan } from "@amplitude/ai";

updateCurrentSpan({ metadata: { environment: "production" } });

Manual exporter approach

For stacks that already emit OpenTelemetry GenAI spans (OpenLIT, Traceloop, OpenAI's OTel instrumentation) and you want to keep your existing OTel pipeline, use the exporters directly:

AmplitudeGenAIExporter (inbound, production-ready): ingests GenAI semantic-convention spans and emits [Agent] events. Ignores non-GenAI spans, so it's safe in a mixed pipeline.
AmplitudeAgentExporter (outbound, experimental): converts Amplitude events into flat OTel spans for forwarding to other backends. Doesn't preserve trace hierarchy.

Prefer enable_otel() for new integrations

The span-first enable_otel() approach handles span routing, de-duplication, and context propagation automatically. Use the manual exporters only when you need to integrate into an existing OTel collector pipeline without changing your SDK initialization.

Choose a privacy mode

Set contentMode on AIConfig:

full (default): captures prompt and response text. redactPii: true is on by default and scrubs emails, phone numbers, SSNs, credit card numbers, IP addresses, and base64-encoded image data before events leave the process. The SDK tunes phone and SSN detection for US formats; add customRedactionPatterns or customRedactionFn for international locales.
metadata_only: token counts, latency, model, and cost only. No prompt or response text. Use for sensitive or regulated data.
customer_enriched: no text by default. Send pre-scored summaries through trackSessionEnrichment(). Designed for teams with existing evaluation stacks.

For managed-agent architectures, prefer full with redactPii: true. The managed API already stores message content server-side, so metadata_only adds no privacy benefit.

Customize redaction

In full mode, extend the default PII scrubbing with custom rules on AIConfig:

customRedactionPatterns / custom_redaction_patterns: regex strings (replaced with [REDACTED]) or { pattern, replacement } objects for named labels like [ticket_id]. Use named patterns for domain-specific identifiers and for international phone or ID formats the defaults don't cover.
customRedactionFn / custom_redaction_fn: a (text) => string callback that runs after all regex-based redaction. Plug in an NER library to scrub names and locations. If it throws, the SDK keeps the text from prior redaction tiers and logs a warning.

Keep custom patterns efficient: avoid catastrophic regexes in hot paths.

Provide your own session enrichments

In customer_enriched mode, the SDK sends no message text. You run your own evaluation pipeline and ship the results back as structured session-level enrichments. Use this when compliance requires zero-content transmission, or when your eval logic goes beyond Amplitude's built-in server-side enrichment.

Build a SessionEnrichments object and send it with trackSessionEnrichment() (Node) or track_session_enrichment() (Python). The enrichment lands as an [Agent] Session Enrichment event, serialized into the [Agent] Enrichments property; the same fields also attach to [Agent] Session End when you set enrichments before the session closes.

The substantive fields on a SessionEnrichments object:

Field	Purpose
`qualityScore`, `sentimentScore`	Numeric quality and sentiment of the session.
`overallOutcome`	Terminal result, such as `resolved` or `escalated`.
`topicClassifications`	Map of taxonomy name to a `TopicClassification` (topic, confidence, subcategories).
`rubricScores`	Array of `RubricScore` (name, score, rationale, evidence).
`agentChain`, `rootAgentName`	Agent topology for multi-agent runs.
`requestComplexity`	Difficulty bucket, such as `low`, `medium`, or `high`.
`errorCategories`	Categorized failure signals from your pipeline.
`messageLabels`	Per-message labels keyed by the message ID returned from each tracking call.
`customMetadata`	Arbitrary key/value data for your own analytics.

import {
  AmplitudeAI,
  AIConfig,
  ContentMode,
  SessionEnrichments,
  RubricScore,
  TopicClassification,
} from "@amplitude/ai";

const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({ contentMode: ContentMode.CUSTOMER_ENRICHED }),
});
const agent = ai.agent("support-bot", { agentVersion: "2.1.0" });

// 1. Run the conversation — no content is sent, only metadata.
const { sessionId } = await agent.session({ userId: "user-42" }).run(async (s) => {
  s.trackUserMessage("Why was I charged twice?");
  s.trackAiMessage(aiResponse.content, "gpt-4o", "openai", latencyMs);
  return { sessionId: s.sessionId };
});

// 2. Score the raw messages with your own pipeline.
const evalResults = await myEvalPipeline(conversationHistory);

// 3. Ship the enrichments back to Amplitude.
const enrichments = new SessionEnrichments({
  qualityScore: evalResults.quality,
  sentimentScore: evalResults.sentiment,
  overallOutcome: evalResults.outcome,
  topicClassifications: {
    billing: new TopicClassification({ topic: "billing-dispute", confidence: 0.92 }),
  },
  rubricScores: [new RubricScore({ name: "accuracy", score: 4, maxScore: 5 })],
  customMetadata: { eval_model: "gpt-4o-judge-v2" },
});

agent.trackSessionEnrichment(enrichments, { sessionId });

This produces the same event properties as Amplitude's built-in enrichment (topics, rubrics, outcomes, message labels), sourced from your pipeline instead.

Message labels

Message labels are key-value pairs attached to individual messages for filtering and segmentation, such as routing tags (flow, surface), classifier output (intent, sentiment), or business context (tier, plan). They emit as [Agent] Message Labels on the message event. Attach them two ways:

Inline, at tracking time, by passing labels to trackUserMessage() / track_user_message().
Retrospectively, when classifier results arrive after the session, through SessionEnrichments.messageLabels keyed by the message ID returned from each tracking call.

Manage cost and tokens

s.trackAiMessage(...) auto-calculates [Agent] Cost USD from the model name and token counts through the bundled Pydantic genai-prices catalog. Two things cause cost_usd: 0:

Unrecognized model name. Vertex AI aliases like claude-sonnet-4-6 won't match the canonical claude-sonnet-4-20250514. Internal gateway labels won't resolve. Brand-new models may not yet be in genai-prices. Pass the canonical provider id, or set totalCostUsd explicitly to override.

Incorrect inputTokens with prompt caching. The SDK expects inputTokens to be cache-inclusive (cached tokens are a subset, never additive). Provider conventions differ:

Provider	Raw API behavior	What to pass as `inputTokens`
OpenAI	`prompt_tokens` already includes `cached_tokens`	Use directly
Anthropic / Bedrock (Converse)	`input_tokens` excludes cache tokens	`input_tokens + cache_read_input_tokens + cache_creation_input_tokens`
Gemini	`promptTokenCount` includes cached; `cachedContentTokenCount` reports separately	Use `promptTokenCount` directly

The built-in Anthropic, Bedrock, and Gemini wrappers handle this normalization for you. Manual trackAiMessage callers need to handle it themselves. Pass cacheReadTokens / cacheCreationTokens separately so the SDK applies the differential pricing.

When you need to compute cost yourself, call calculateCost({ modelName, inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens }) and pass the result as totalCostUsd.

Keep pricing data current

Cost relies on the bundled genai-prices catalog, so a newly released model can report [Agent] Cost USD of 0 until the catalog updates. To fetch the latest prices at runtime, opt in at startup with enableLivePriceUpdates() / enable_live_price_updates(). It refreshes prices periodically over HTTPS, so enable it only where outbound network access is allowed.

Track semantic cache hits

When you serve a full response from your own semantic or response cache, pass wasCached: true (Node) or was_cached=True (Python) on the AI-message call. It maps to [Agent] Was Cached, distinct from token-level prompt caching, so you can chart cache-hit rate and the cost it saves.

Shape message content

The first argument to trackUserMessage becomes $llm_message.text on [Agent] User Message. This is what session lists, segmentation, and enrichment treat as "what the user said". Two practical rules:

Do pass a short natural-language line as the message body. For example, the real prompt, or a canonical summary for headless jobs:

typescript

s.trackUserMessage(
  "Summarize the attached design doc and list open questions",
  {
    context: { structuredPayload: payloadRecord },
  },
);

Don't pass large JSON blobs as the message body. The product uses the JSON as the session title and breaks down charts by raw JSON:

typescript

// Session label becomes the JSON
s.trackUserMessage(JSON.stringify(payloadRecord));

Put structured segmentation dimensions in the context option (becomes [Agent] Context JSON, queryable in charts). For server-side enrichment to reason over structured data, also keep essential facts in content. Enrichments derive eval input primarily from turn text, not from [Agent] Context.

Instrument without the SDK

For unsupported runtimes (Java, Go, Ruby, edge environments), send events to the Amplitude HTTP API directly:

bash

curl -X POST https://api2.amplitude.com/2/httpapi \
  -H 'Content-Type: application/json' \
  -d '{
    "api_key": "YOUR_API_KEY",
    "events": [{
      "event_type": "[Agent] User Message",
      "user_id": "user-123",
      "event_properties": {
        "[Agent] Session ID": "sess-abc",
        "[Agent] Agent ID": "support-chatbot",
        "$llm_message": { "text": "How do I cancel my subscription?" }
      }
    }]
  }'

Use $llm_message.text for message content (the ingestion pipeline reads this property for interaction text). For the full property reference and event JSON examples, refer to the Agent Analytics taxonomy.

When you send events directly, you're responsible for what the SDK otherwise handles:

Concern	What you must do
Session ID	Generate one ID per conversation and set it as `[Agent] Session ID` on every event.
Deduplication	Set a unique `insert_id` per event so retries don't create duplicates.
Property prefixing	Prefix every property name with `[Agent]` (or `[Amplitude]` for the Session Replay ID).
Cost and tokens	Compute `[Agent] Cost USD` yourself; the SDK's automatic pricing isn't available.
Server enrichment	Still runs automatically once `[Agent] Session End` lands, when content is present.

Verify your data

Run the doctor to validate env vars, installed dependencies, and the event-pipeline connection:

bash

npx amplitude-ai doctor

Then confirm events land in Amplitude:

Open the project's Live Events stream.
Send a test session from the instrumented code.
Within seconds, an [Agent] AI Response event should appear with these properties populated:
- [Agent] Session ID, [Agent] Agent ID
- [Agent] Model Name, [Agent] Provider
- [Agent] Latency Ms
- [Agent] Input Tokens, [Agent] Output Tokens
- [Agent] Cost USD

Local verification with `summary()`

Before deploying, use MockAmplitudeAI.summary() to get a fill-rate report of all captured events. It checks eight verification gates and flags gaps before data reaches Amplitude.

typescript

import { AIConfig } from "@amplitude/ai";
import { MockAmplitudeAI } from "@amplitude/ai/testing";

const mock = new MockAmplitudeAI(new AIConfig({ contentMode: "full" }));
const agent = mock.agent("test-agent", { userId: "u1" });

await agent.session({ sessionId: "s1" }).run(async (s) => {
  s.trackUserMessage("hello");
  s.trackAiMessage("response", "gpt-4o-mini", "openai", 150);
});

console.log(mock.summary());

The summary output looks like:

text

Agent Analytics fill-rate report
================================
Events captured: 2
  [Agent] User Message:  1
  [Agent] AI Response:   1

Verification gates (8/8 passing):
  ✓ user_id or device_id present
  ✓ [Agent] Session ID present
  ✓ [Agent] Agent ID present
  ✓ [Agent] Model Name present
  ✓ [Agent] Provider present
  ✓ [Agent] Latency Ms > 0
  ✓ [Agent] Input Tokens > 0
  ✓ [Agent] Output Tokens > 0
  ✓ [Agent] Cost USD > 0

Fixing common issues:

Gate failing	Cause	Fix
`user_id` missing	No `userId` or `deviceId` passed to the session	Set `userId` on `agent.session()` or forward `deviceId` from the Browser SDK
`Session ID` missing	Session created without an ID	Pass `sessionId` to `agent.session()`
`Model` / `Provider`	Using `patch()` without a supported provider, or custom gateway	Pass model and provider explicitly to `trackAiMessage()`, or use a provider wrapper
`Input/Output Tokens = 0`	Provider doesn't return usage in streaming mode	Use `onFinish` / `stream_options: { include_usage: true }` to capture final token counts
`Cost USD = 0`	Unrecognized model name	Use the canonical provider model id, or set `totalCostUsd` explicitly

Test against a mock client

For CI, use MockAmplitudeAI from @amplitude/ai/testing to assert your events emit correctly:

typescript

import { AIConfig } from "@amplitude/ai";
import { MockAmplitudeAI } from "@amplitude/ai/testing";

const mock = new MockAmplitudeAI(new AIConfig({ contentMode: "full" }));
const agent = mock.agent("test-agent", { userId: "u1" });

await agent.session({ sessionId: "s1" }).run(async (s) => {
  s.trackUserMessage("hello");
  s.trackAiMessage("response", "gpt-4o-mini", "openai", 150);
});

mock.assertEventTracked("[Agent] User Message", { userId: "u1" });
mock.assertSessionClosed("s1");

// Data quality gate: every AI Response must carry the eight verification fields
for (const e of mock.eventsOfType("[Agent] AI Response")) {
  const p = e.event_properties ?? {};
  expect(e.user_id || e.device_id).toBeTruthy();
  expect(p["[Agent] Session ID"]).toBeTruthy();
  expect(p["[Agent] Model Name"]).toBeTruthy();
  expect(p["[Agent] Provider"]).toBeTruthy();
  expect(p["[Agent] Latency Ms"]).toBeGreaterThan(0);
  expect(p["[Agent] Input Tokens"]).toBeGreaterThan(0);
  expect(p["[Agent] Output Tokens"]).toBeGreaterThan(0);
  expect(p["[Agent] Cost USD"]).toBeGreaterThan(0);
}

Keep this test in CI to catch silent instrumentation regressions such as bad model names or missing token counts produce broken dashboards without throwing at runtime.

Reliability and error handling

Instrumentation can't take your application down:

Tracking calls never throw. Every track* method catches and logs its own errors internally. A serialization bug or a bad field can't interrupt your agent's request path.
The SDK buffers and retries events. The underlying @amplitude/analytics-node client batches events and retries failed sends from its transport layer.
Failures degrade gracefully. If Amplitude is unreachable, the SDK drops events silently after exhausting retries. Your application keeps operating.

For development, set validate: true (Node) or validate=True (Python) on AIConfig to surface missing required fields, such as userId or sessionId, early. Validation errors throw ValidationError so you can catch them in tests before they reach production. Combine with dryRun / dry_run for the strictest CI checking.

Auto-instrument and CLI tools

To instrument without editing any call sites, auto-patch supported providers at process start. This is the fastest way to confirm the SDK is wired up; for the full event model (user messages, sessions, scores), use agents and sessions as shown in Initialize the SDK.

# Wrapper command
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js

# Or Node's ESM preload flag directly
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true node --import @amplitude/ai/register app.js

Both runtimes read the same environment variables:

Variable	Description
`AMPLITUDE_AI_API_KEY`	Required to enable auto-patch.
`AMPLITUDE_AI_AUTO_PATCH`	Must be `"true"` to turn auto-patching on.
`AMPLITUDE_AI_CONTENT_MODE`	`full` (default), `metadata_only`, or `customer_enriched`.
`AMPLITUDE_AI_DEBUG`	`"true"` to log each event to stderr.

To inspect the environment without running your app, use amplitude-ai status. It prints the installed SDK version, the provider packages it detects, and the current environment-variable configuration. To validate dependencies and the event-pipeline connection, refer to Verify your data for the doctor command.

Register the event schema in your data catalog

The SDK ships a CLI that registers all [Agent] event types and their properties in Amplitude's Data Catalog, so events arrive documented with descriptions, types, and required flags instead of being inferred from ingestion.

Prerequisites: a plan with Taxonomy API access, and the project's API key and secret key from Settings > Projects.

The bundled CLI reads the event catalog and prints executable curl commands. It makes no network requests itself, so you can review the commands before running them.

bash

# Print commands with your keys
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET

# Execute immediately
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET | bash

# EU data residency
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET --eu | bash

The commands are idempotent: they create missing events and properties and update existing ones, so it's safe to re-run after an SDK upgrade adds new fields.

Debug and dry-run

Two AIConfig flags help you inspect events locally, each with an environment-variable equivalent for use with auto-instrumentation.

debug: true (Node) / debug=True (Python) logs a one-line summary of every event to stderr and still sends events to Amplitude:

text

[amplitude-ai] [Agent] AI Response | user=user-123 session=sess-abc agent=my-agent model=gpt-4o latency=1203ms tokens=150→847 cost=$0.0042

dryRun: true (Node) / dry_run=True (Python) logs the full event JSON to stderr and never transmits anything. Use it to validate event shape in local development and CI without a live API key. With auto-instrumentation, set AMPLITUDE_AI_DEBUG=true on the command instead.

Troubleshooting

Issue	Solution
`[Agent] Cost USD` is `$0`	Model name not in `genai-prices`. Use the canonical provider id, or set `totalCostUsd` explicitly.
Anthropic cache token mismatch	Add `cache_read_input_tokens` and `cache_creation_input_tokens` to `inputTokens`. Go to Manage cost and tokens.
Empty session records	Update to the latest SDK; sessions now materialize only on real activity.
Events don't appear in Live Events	Confirm the API key matches the Agent Analytics project.
`node:async_hooks` error in Cloudflare Workers	Use the FetchAmplitudeClient pattern.
Tool calls have `latencyMs: 0`	They were extracted by `patch()` from message arrays. Use `tool()` or `trackToolCall()` for real latency.
Session ends before stream finishes	Refer to Stream responses, and keep the session open until the stream is consumed.

API reference

Core classes

API	Purpose
`new AmplitudeAI({ apiKey, config? })`	Initialize the SDK
`new AIConfig({ contentMode?, redactPii?, customRedactionPatterns?, customRedactionFn?, dryRun?, debug? })`	Privacy and debug config
`ai.agent(agentId, opts?)`	Create a bound agent
`agent.child(agentId, opts?)`	Create a child agent for delegation
`agent.session(opts?)`	Create a session (auto-flushes in serverless)
`session.run(fn)`	Execute work with session context
`s.runAs(childAgent, fn)`	Delegate to a child agent
`ai.enableOtel()` / `ai.enable_otel()`	Enable OTEL span-first instrumentation
`ai.otelEnabled` / `ai.otel_enabled`	Whether OTEL mode is active (read-only)
`ai.flush()`	Flush buffered events (serverless / streaming)
`ai.shutdown()`	Flush, then close the analytics client (process exit)
`ai.tenant(orgId, opts?)`	Tenant-scoped handle that pre-binds `customerOrgId`
`ai.score({ userId, name, value, targetId?, targetType?, source? })`	Record explicit user feedback as `[Agent] Score`

Session tracking methods

Method	Event
`s.trackUserMessage(content, opts?)`	`[Agent] User Message`
`s.trackAiMessage(content, model, provider, latencyMs, opts?)`	`[Agent] AI Response`
`s.trackToolCall(name, latencyMs, success, opts?)`	`[Agent] Tool Call`
`s.trackSpan({ name, latencyMs, ... })`	`[Agent] Span`
`s.trackSessionEnrichment({...})`	Session-level enrichment (customer_enriched mode)

Higher-order functions

HOF	Event	Use
`tool(fn, { name })`	`[Agent] Tool Call`	Wrap tool functions
`observe(fn, { name, type? })`	`[Agent] Span`	Wrap any function for observability (with OTEL: creates real spans; `type` controls event routing)

Other APIs

API	Use
`patch({ amplitudeAI: ai })` / `unpatch()`	Zero-code instrumentation, auto-extracts Tool Calls from message arrays
`wrap(client, ai)`	Wrap an existing provider client without modifying its construction
`injectContext()` / `extractContext(headers)`	Cross-service propagation
`usingAttributes(attrs, fn)` / `using_attributes(**attrs)`	Attach identity and session context to OTEL spans
`updateCurrentSpan(attrs)` / `update_current_span(**attrs)`	Update attributes on the active OTEL span
`createAmplitudeAIMiddleware(opts)`	Express / Fastify / Hono middleware
`calculateCost({ modelName, ... })`	Compute cost directly when you need to override `totalCostUsd`
`trackConversation({ ... })`	Backfill a full message history as events
`inferModelTier(model)`	Resolve a model's tier (`fast` / `standard` / `reasoning`)
`enableLivePriceUpdates()`	Refresh `genai-prices` cost data at runtime
`MockAmplitudeAI` (`@amplitude/ai/testing`)	Deterministic test double; call `.summary()` for a fill-rate report
`ClaudeAgentSDKTracker` (`@amplitude/ai/integrations/claude-agent-sdk`)	Claude Agent SDK integration

Was this helpful?

Agent Analytics SDK

What you set and what you get

What you get at each level

Prerequisites

Install the SDK

Initialize the SDK

Configure the SDK

Instrument an agent session

Auto-instrument provider calls

Provider wrappers

patch()

Track tools

Track spans

Manual instrumentation

Add segmentation with context

Merge context across child agents

Query context in Amplitude

Use multiple tenants

Classify model tiers

Track attachments

Capture implicit feedback

Import existing conversations

Send user feedback (scores)

Multi-agent architectures

Integration patterns

Single-request API endpoint

Long-lived session (chatbot)

Multi-agent orchestration

Fan-out (parallel child calls, single user turn)

Stream responses

Link to the standard-analytics session

Propagate context across services

Supported providers and frameworks

Framework integrations

LangChain

LlamaIndex

OpenAI Agents SDK

Anthropic Tool Use

OpenTelemetry attribute mapping

Provider-specific notes

Vercel AI SDK

Claude Agent SDK

Anthropic Managed Agents

OpenAI Assistants API

MCP servers

Framework notes

Next.js (App Router)

Express / Fastify / Hono

Run in serverless environments

Edge runtimes and Cloudflare Workers

Ingest OpenTelemetry spans

Install OTEL dependencies

Span-first approach (recommended)

Custom function tracking with @observe

Context propagation

Manual exporter approach

Choose a privacy mode

Customize redaction

Provide your own session enrichments

Message labels

Manage cost and tokens

Keep pricing data current

Track semantic cache hits

Shape message content

Instrument without the SDK

Verify your data

Local verification with summary()

Test against a mock client

Reliability and error handling

Auto-instrument and CLI tools

Register the event schema in your data catalog

Debug and dry-run

Troubleshooting

API reference

Core classes

Session tracking methods

Higher-order functions

Other APIs

`patch()`

Custom function tracking with `@observe`

Local verification with `summary()`