Platform

AI

AI Agents
Sense, decide, and act faster than ever before
AI Visibility
See how your brand shows up in AI search
AI Feedback
Distill what your customers say they want
Amplitude MCP
Insights from the comfort of your favorite AI tool

Insights

Product Analytics
Understand the full user journey
Marketing Analytics
Get the metrics you need with one line of code
Session Replay
Visualize sessions based on events in your product
Heatmaps
Visualize clicks, scrolls, and engagement

Action

Guides and Surveys
Guide your users and collect feedback
Feature Experimentation
Innovate with personalized product experiences
Web Experimentation
Drive conversion with A/B testing powered by data
Feature Management
Build fast, target easily, and learn as you ship
Activation
Unite data across teams

Data

Warehouse-native Amplitude
Unlock insights from your data warehouse
Data Governance
Complete data you can trust
Security & Privacy
Keep your data secure and compliant
Integrations
Connect Amplitude to hundreds of partners
Solutions
Solutions that drive business results
Deliver customer value and drive business outcomes
Amplitude Solutions →

Industry

Financial Services
Personalize the banking experience
B2B
Maximize product adoption
Media
Identify impactful content
Healthcare
Simplify the digital healthcare experience
Ecommerce
Optimize for transactions

Use Case

Acquisition
Get users hooked from day one
Retention
Understand your customers like no one else
Monetization
Turn behavior into business

Team

Product
Fuel faster growth
Data
Make trusted data accessible
Engineering
Ship faster, learn more
Marketing
Build customers for life
Executive
Power decisions, shape the future

Size

Startups
Free analytics tools for startups
Enterprise
Advanced analytics for scaling businesses
Resources

Learn

Blog
Thought leadership from industry experts
Resource Library
Expertise to guide your growth
Compare
See how we stack up against the competition
Glossary
Learn about analytics, product, and technical terms
Explore Hub
Detailed guides on product and web analytics

Connect

Community
Connect with peers in product analytics
Events
Register for live or virtual events
Customers
Discover why customers love Amplitude
Partners
Accelerate business value through our ecosystem

Support & Services

Customer Help Center
All support resources in one place: policies, customer portal, and request forms
Developer Hub
Integrate and instrument Amplitude
Academy & Training
Become an Amplitude pro
Professional Services
Drive business success with expert guidance and support
Product Updates
See what's new from Amplitude

Tools

Benchmarks
Understand how your product compares
Templates
Kickstart your analysis with custom dashboard templates
Tracking Guides
Learn how to track events and metrics with Amplitude
Maturity Model
Learn more about our digital experience maturity model
Pricing
LoginContact salesGet started

AI

AI AgentsAI VisibilityAI FeedbackAmplitude MCP

Insights

Product AnalyticsMarketing AnalyticsSession ReplayHeatmaps

Action

Guides and SurveysFeature ExperimentationWeb ExperimentationFeature ManagementActivation

Data

Warehouse-native AmplitudeData GovernanceSecurity & PrivacyIntegrations
Amplitude Solutions →

Industry

Financial ServicesB2BMediaHealthcareEcommerce

Use Case

AcquisitionRetentionMonetization

Team

ProductDataEngineeringMarketingExecutive

Size

StartupsEnterprise

Learn

BlogResource LibraryCompareGlossaryExplore Hub

Connect

CommunityEventsCustomersPartners

Support & Services

Customer Help CenterDeveloper HubAcademy & TrainingProfessional ServicesProduct Updates

Tools

BenchmarksTemplatesTracking GuidesMaturity Model
LoginSign Up

Amplitude Dashboard Outage: Post Mortem

What happened, why the incident happened, and what we are doing to improve our processes for the future.
Company

Jan 22, 2016

7 min read

Spenser Skates

Spenser Skates

CEO and Co-founder, Amplitude

Amplitude Dashboard Outage: Post Mortem

On Monday, January 4, 2016, from 8:22 PM PST to 11:37 PM PST, we experienced an outage that prevented our customers from accessing their data on Amplitude. Following the outage, data on Amplitude remained stale until 3:23 PM PST on Monday, January 11, and several important features on Amplitude were inaccessible. We know many of our customers rely on Amplitude being available and up-to-date for their businesses, and we let you down. We’d like to take this opportunity to explain what happened, how we responded, and steps we are taking to prevent future outages like this from happening again.

What happened?

On Monday, January 4, at 8:22 PM PST, an engineer erroneously ran a script in the production environment that was meant to run on a development environment. The script deleted four tables on DynamoDB that contained metadata used for processing events and querying data. Specifically, these tables contained the following information:

  • Internal configuration of services
  • File metadata used by the query engine
  • Metadata pertaining to all device IDs we have seen
  • Metadata pertaining to all Amplitude IDs we have assigned

When these tables were deleted, the web reporting dashboards on Amplitude became inaccessible. In addition, our processing pipeline halted, as it could not proceed without the ID information. Event data from clients was still being collected and stored in a queue that could be processed later.

Our immediate priority was to make the dashboards accessible again. Our query engine uses the internal configuration to determine which partitions to query and the file metadata to determine where the data physically lives. We were able to recover the internal configuration and file metadata from backups within a few hours.

At 11:37 PM PST, customers were able to access most of the dashboards. Since processing was still paused, the dashboards reflected data collected prior to 8:22 PM PST. Real-time activity, user timelines, Microscope, cohort recomputation and downloads all relied on information in the tables we hadn’t recovered yet, so these features remained unavailable.

The next step was to recover the two ID tables. Unfortunately, we did not have backups for these tables. We did, however, have all the historical events, which we could use to recreate the data in those tables. At 1 AM PST on Tuesday, January 5, we began developing and testing a sequence of MapReduce jobs to reconstruct and then repopulate the data. At 1 PM PST, we started the job to reconstruct the data; it took about 14 hours to complete.

On Wednesday, January 6 at 4:30 AM PST, we began repopulating the ID tables. We kicked off the final MapReduce job at 3:30 PM PST and began validating the repopulated dataset in parallel. The jobs and validation completed on Thursday, January 7 at 1:30 PM PST. At this point, the dashboards were fully functional for data prior to January 4 8:22 PM PST. We then resumed data processing on the event backlog.

We originally anticipated it taking 1-2 days to process the backlog, but we had to push back the estimate by several days. In typical operation, our collection servers will throttle devices that send us data volumes that are many orders of magnitude more than realistic, as informed by our processing pipeline. During the outage, this functionality was inactive and resulted in us collecting significantly more data than usual. This caused the backlog to take longer than expected to catch up.

On Monday, January 11 at 9:30 AM PST, we completed processing the backlog and began doing data validation on the dashboards. After extensive testing, at 3:23 PM PST, we confirmed that all data had been correctly processed and resumed normal operation. Throughout the incident data collection was fully operational.

Why did it happen?

This incident and subsequent length of recovery were a result of a combination of factors.

We unfortunately did not have sufficient protection against a script running on the production environment that could delete operationally critical tables. The recovery was made difficult because we did not have usable backups for some of our tables in DynamoDB, which forced us to reconstruct a large amount of state from historical data. Even for tables with backups, their recovery was delayed because we did not have procedures in place to recover data from those backups in an efficient manner.

The engineering team worked to resolve the problem throughout Monday night, but did not notify the rest of the organization until the following day. We did not have a clear process for escalation in place, which caused our initial response and communication to customers to be significantly delayed.

Once the incident was escalated properly, we notified all customers via email with an explanation of the situation and our best estimate of when we would be fully recovered. However, we underestimated how long it would take to get back to a fully recovered state and thus presented estimates that were incorrect and had to be pushed back.

What are we doing to prevent it from happening again?

There’s a lot to learn and improve on from this incident.

We’ve already taken steps to restrict AWS accounts from having delete access to critical data, and will be using finer grained permissions on our AWS accounts. We will reevaluate the permissions we grant to each account and role and make sure those permissions are the minimum necessary.

We are setting up automated backups for the few remaining databases that currently do not have backups. In addition, we plan to develop and rehearse methods of quickly recovering from the backups.

Additionally, we’ll be performing a comprehensive review of our system over the next few months to identify weak points and ensure that we’re not vulnerable to an incident like this in the future. We plan to share the results from the review on this blog a few months from now.

Lastly, we are also putting in place policies and procedures for incident response, to reduce the time it takes for customers to be notified about outages and the time it takes for services to come back online.

Thank you for being patient with us throughout the outage. We sincerely apologize for the downtime and understand that our customers rely on our service being available for their businesses. We will do everything we can to improve our processes to ensure that you can rely on Amplitude in the future.

Thank you for your support.

About the author
Spenser Skates

Spenser Skates

CEO and Co-founder, Amplitude

More from Spenser

Spenser is the CEO and Co-founder of Amplitude. He experienced the need for a better product analytics solution firsthand while developing Sonalight, a text-to-voice app. Out of that need, Spenser created Amplitude so that everyone can learn from user behavior to build better products.

More from Spenser
Topics
Platform
  • Product Analytics
  • Feature Experimentation
  • Feature Management
  • Web Analytics
  • Web Experimentation
  • Session Replay
  • Activation
  • Guides and Surveys
  • AI Agents
  • AI Visibility
  • AI Feedback
  • Amplitude MCP
Compare us
  • Adobe
  • Google Analytics
  • Mixpanel
  • Heap
  • Optimizely
  • Fullstory
  • Pendo
Resources
  • Resource Library
  • Blog
  • Product Updates
  • Amp Champs
  • Amplitude Academy
  • Events
  • Glossary
Partners & Support
  • Contact Us
  • Customer Help Center
  • Community
  • Developer Docs
  • Find a Partner
  • Become an affiliate
Company
  • About Us
  • Careers
  • Press & News
  • Investor Relations
  • Diversity, Equity & Inclusion
Terms of ServicePrivacy NoticeAcceptable Use PolicyLegal
EnglishJapanese (日本語)Korean (한국어)Español (Spain)Português (Brasil)Português (Portugal)FrançaisDeutsch
© 2025 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.
Blog
InsightsProductCompanyCustomers
Topics

101

AI

APJ

Acquisition

Adobe Analytics

Amplify

Amplitude Academy

Amplitude Activation

Amplitude Analytics

Amplitude Audiences

Amplitude Community

Amplitude Feature Experimentation

Amplitude Guides and Surveys

Amplitude Heatmaps

Amplitude Made Easy

Amplitude Session Replay

Amplitude Web Experimentation

Amplitude on Amplitude

Analytics

B2B SaaS

Behavioral Analytics

Benchmarks

Churn Analysis

Cohort Analysis

Collaboration

Consolidation

Conversion

Customer Experience

Customer Lifetime Value

DEI

Data

Data Governance

Data Management

Data Tables

Digital Experience Maturity

Digital Native

Digital Transformer

EMEA

Ecommerce

Employee Resource Group

Engagement

Event Tracking

Experimentation

Feature Adoption

Financial Services

Funnel Analysis

Getting Started

Google Analytics

Growth

Healthcare

How I Amplitude

Implementation

Integration

LATAM

Life at Amplitude

MCP

Machine Learning

Marketing Analytics

Media and Entertainment

Metrics

Modern Data Series

Monetization

Next Gen Builders

North Star Metric

Partnerships

Personalization

Pioneer Awards

Privacy

Product 50

Product Analytics

Product Design

Product Management

Product Releases

Product Strategy

Product-Led Growth

Recap

Retention

Startup

Tech Stack

The Ampys

Warehouse-native Amplitude

Recommended Reading

article card image
Read 
Insights
The Product Benchmarks Every Media and Entertainment Company Should Know

Dec 23, 2025

5 min read

article card image
Read 
Customers
Amplitude Pathfinder: Why Austin Costello is a Triple Threat Analyst

Dec 22, 2025

8 min read

article card image
Read 
Product
Introducing Amplitude on Amplitude

Dec 22, 2025

3 min read

article card image
Read 
Insights
Stop Asking, Start Listening: How to Connect Feedback to Behavior

Dec 19, 2025

12 min read

Explore Related Content

Integration
Using Behavioral Analytics for Growth with the Amplitude App on HubSpot

Jun 17, 2024

10 min read

Personalization
Identity Resolution: The Secret to a 360-Degree Customer View

Feb 16, 2024

10 min read

Product
Inside Warehouse-native Amplitude: A Technical Deep Dive

Jun 27, 2023

15 min read

Guide
5 Proven Strategies to Boost Customer Engagement

Jul 12, 2023

Video
Designing High-Impact Experiments

May 13, 2024

Startup
9 Direct-to-consumer Marketing Tactics to Accelerate Ecommerce Growth

Feb 20, 2024

10 min read

Growth
Leveraging Analytics to Achieve Product-Market Fit

Jul 20, 2023

10 min read

Product
iFood Serves Up 54% More Checkouts with Error Message Makeover

Oct 7, 2024

9 min read