This post is part of our Amplitude AI Builder series. Each one will feature an Amplitude engineer discussing an AI product that they are building.
Inside your company’s data, there are answers to important business questions. That’s no secret. Organizations around the world have data teams built around that premise. Every line of business uses data to define goals and measure performance.
But analyzing data to answer questions isn’t the end of the process; it’s the beginning. There’s a second step that requires teams to turn that information into actual change—new product updates, website updates, marketing rollouts, etc.
In this post, I talked to Eric Carlson, senior principal AI engineer at Amplitude, about how his team is building , a new way for companies to turn data into signal and use those insights to take action.
What do AI Agents do?
That’s a big question. It’s something that the whole industry is trying to figure out and define at once. To me, the job of an AI agent is to reason, collaborate, and take actions on key workflows. Very soon, Agents will do that work autonomously, but for now, they work with the user to produce insights and take approved actions. At the tactical level, an AI Agent works like a data scientist and a product analyst in your back pocket.
When we started building AI Agents, we wanted to make something that starts with context around the customer’s product, and then uses Amplitude's core data platform primitives—analytics, session replay, guides, all of our blades, etc.—to start drawing lines through common workflows with AI. The new generation of AI tools is great at combining diverse data with complex tool usage, which fits really well into the broader Amplitude platform vision. Agents are a way to unlock the cross-platform functionality in a way that’s relatively automatable and drives toward real value.
There are different types of Agents, but each one is essentially an LLM that can use tools. It may have memory, very deep instruction sets, evaluations, but the core is simple. A central challenge for Amplitude is understanding effective patterns to efficiently represent the massive volumes of Amplitude data—events, replays, experiments, product data—for those LLMs. For example, just our homepage has about a million raw characters, which saturates the context window if fed in raw. We put a lot of energy into understanding how to actually filter, aggregate, query, and represent that amount of data for the model without allowing it to get confused.
How do you imagine an Amplitude customer using AI Agents?
Ultimately, we’re trying to use Agents to simplify the Sense > Decide > Act loop. To do that, they need to both think and act, which requires functionality across all of our product blades. We want Agents to be the integration point that connects all of those products together, in a way that is transparent to the normal user, and a Swiss Army knife to the power user.
Ultimately, we hope that the typical customer will use Agents to extract greater value than they could before Agents existed, to utilize all the batteries the Amplitude platform includes, and to get all this delivered to their inbox, as well as automate product development flows through MCP connections. We want to be proactive, not reactive.
Which of the Agents did you build first?
The very first agent that we built was the website optimization agent. Amplitude has a ton of product metrics data. If we give the Agent a specific target that customers would want to improve (e.g., clicking through on a blog conversion, optimizing an ecommerce funnel, etc.), the Agent can reverse engineer activities that might drive that metric by examining the existing product experience—friction points on the page that are blocking that target, design issues, behavioral cohorts.
The end goal of that Agent is to take in that information—product data, Amplitude best practices, data, context about the data—and produce a concrete strategy for optimizing a given metric. The Agent can then take its own action to execute that strategy. It generates variants, runs experiments, modifies web pages, suggests guides, etc. This is work that product teams already do today, but you have to be a power Amplitude user to manage all those steps at once. The Agent simplifies that process and makes it available to a lot more people.
Once this is running, we have set up a powerful feedback loop that can provide a lot of value for customers and gets us closer to our North Star vision: the self-improving product.
What are the biggest challenges in building Agents?
We spend a lot of time on context engineering. We have to make sure we’re using the latest and greatest models to convert an input question, execute the workflow steps, and ensure that Amplitude tools are integrated in the best way possible. Each step in this workflow has to go through the same process.
Take web variant generation as an example. We've done really deep experimentation across a lot of dimensions to try to understand what the best possible generator versions are, how to compress the context, and how to pull out all of these different elements. Do you have an error correction loop in there to inspect the page after you've applied your changes? Each of these little subproblems comes up.
You also need to evaluate the work, which is a whole other challenge. You have to set up good examples of what you want the agent to be doing, and then assert that that behavior is true. In the case of something like web variant generation, we look at whether the design elements look correct, the color styles apply correctly, or if there are occluding elements. So you can set up a specific set of evals in each of these little applications.
Is it difficult to get people to trust AI Agents? Does that impact the way they use them?
People's trust levels in AI are changing quickly. For something like a dashboard agent, people already trust AI to look at dashboards and give an executive summary. They understand that the process can be automated. They trust AI to read data and find insights.
Coding is a little different. With tools like Cursor or Co-pilot, the AI just goes off and generates its own thing. You still come back and review it. But the early experiments with this have been successful. In a few months or a year, we might not even feel like human review is necessary.
In terms of usage, one of the things that’s really interesting is that people have found AI super useful as a thought partner, but the bar is really high for taking the next step and letting the AI take action on a product. Teams aren’t just letting Agents deploy live web experiments on a customer site where they have a lot of high traffic. Working through that reaction is interesting; it’s actually shifted our strategy a little bit more to focus on the insights layer.
What are some of the most promising new context sources you are working on?
Session replays are hugely untapped. While we think of them as if they were videos, they are actually structured data that captures all user interactions and the underlying web page state. You have mouse velocity, scroll velocity, friction points, etc. It’s such a rich baseline there. Almost anything is possible. But it’s fairly complicated to process that data because it’s very high-dimensional.
One of the things that LLMs are really good at is generating heuristic code. So, instead of processing raw videos and images, which would be really expensive and slow, you can build a bunch of parsers for that data and process a couple of sessions at once. The parsers might be domain-specific or specific to particular types of signals. You can run those on a frequent basis and let the LLM choose which analysis line to follow, how to define a taxonomy for a customer within that particular problem, and how to compare segments. All this can be done in concert with Amplitude’s more typical precision tracking.
An interesting thing happened when we paired that with another agent pretending to be a product manager asking questions. We ran simulations where these two talked to each other as if they were humans. We could watch them go deeper and deeper into the analysis and product, iterate through different product questions, and arrive at a very comprehensive understanding of the underlying product shortcomings.
What’s the future of Amplitude AI Agents?
We’re starting AI Agents with a ChatGPT-style interaction: you’re working with an Agent by having a conversation with it. I think there’s a future where Amplitude AI is a platform.
At Amplitude, we’re working on a bunch of different efforts that incorporate AI: Chart Chat, Automated Insights, the semantic taxonomy, all the different Agents, Session Replay, etc. We’re also adding MCP and A2A protocol support for customers to connect directly to these agents and tools. Internally, we use a tool called Moda to accelerate our projects. How can we make that useful to both accelerate the rate at which we build internally and allow teams to scale independently? How do we make it much simpler on the rails, compliant, all of those things?
Agents can be the central connection point for all those AI efforts. It can be the thing that gets you past bottlenecks. It can be the tool that takes you from idea to implementation as fast as possible.