On this page

Data Mutability Features

Amplitude Academy

Event Mutability: Sync Data Warehouse Changes to Amplitude

Learn to sync data warehouse changes to Amplitude.

Get started

Amplitude's Data Mutability features let you keep data consistent between your warehouse and Amplitude by supporting INSERT, UPDATE, and DELETE operations on your event data. This capability is available through Mirror Sync strategies across multiple warehouse integrations, which lets you keep your Amplitude data synchronized with your source of truth.

Data Mutability lets you:

  • Insert new events into Amplitude.
  • Update existing events with new information.
  • Delete events that should no longer exist in your analytics.

This functionality is especially valuable for organizations that need to:

  • Correct historical data errors.
  • Adhere to data privacy regulations (GDPR, CCPA).
  • Maintain data consistency across systems.
  • Handle late-arriving or corrected data.

Supported data sources

Data Mutability is available through the following warehouse integrations.

Snowflake

Databricks

Amazon S3

How Mirror Sync works

When you enable Mirror Sync with data mutability:

  1. Change detection: The integration monitors your warehouse for data changes using native change tracking features (CDC for Snowflake, CDF for Databricks, or file metadata for S3).

  2. Operation processing: Amplitude processes three types of operations:

    • INSERT: Adds new events to Amplitude.
    • UPDATE: Modifies existing events in Amplitude.
    • DELETE: Removes events from Amplitude.

    Amplitude finds matching events based on the combination of user_id, insert_id, and event_time. All three fields must match before Amplitude can identify and modify the correct event.

  3. Data synchronization: Changes apply to keep consistency between your warehouse and Amplitude.

Enrichment services

Enrichment Services Disabled

When using Mirror Sync with data mutability, Amplitude disables enrichment services, including:

  • ID resolution and user merging.
  • Property and attribution syncing.
  • Location resolution.
  • Taxonomy validation.

Disabling enrichment ensures your data remains exactly as it exists in your source of truth.

General requirements

  • User ID required: All events must contain a user ID. Mirror Sync doesn't support anonymous events.
  • Unique Insert ID: Each event should have a unique and immutable insert_id to prevent duplication.
  • Chronological order: Process events in chronological order when possible.

Event volume considerations

Event Volume Impact

Data mutations count toward your event volume:

  • Warehouse sources (Snowflake, Databricks): Multiple operations on the same event within a sync window count as one event.
  • File sources (S3): Each operation counts separately toward your event volume.

Monitor your usage and contact sales if you need additional event volume.

Data retention

  • Snowflake: DATA_RETENTION_TIME_IN_DAYS must be ≥ 1 (recommended: ≥ 7 days).
  • Databricks: Change Data Feed retention must cover your sync frequency.
  • S3: Files must remain accessible throughout processing.

Best practices

Keep the following best practices in mind as you enable data mutability.

Plan your implementation

  1. Start with a test project: Create a dedicated test environment to validate your mutation logic before implementing in production.

  2. Design for idempotency: Ensure your mutation operations can be safely retried without causing data inconsistencies.

  3. Monitor data quality: Implement validation checks to ensure mutations apply correctly.

Data privacy compliance

When using data mutability for privacy compliance:

  1. Stop data flow first: Before you delete user data, ensure you send no new data about that user to Amplitude.

  2. Use User Privacy API: For complete user deletion, use the User Privacy API with warehouse deletions.

  3. Verify deletion: Confirm that deleted data no longer appears in your analytics.

Performance optimization

  • Batch operations: Group related mutations together when possible.
  • Optimize sync frequency: Balance data freshness needs with processing overhead.
  • Monitor resource usage: Track warehouse compute costs associated with change tracking.

Migrate to data mutability

If you're migrating from a standard ingestion strategy to Mirror Sync, follow these steps.

  1. Create cutoff strategy:

    • Modify the existing connection with a time filter (for example, WHERE time < {cutOffDate}).
    • Set the cutoff date to tomorrow in milliseconds since epoch.
  2. Wait for cutoff: Allow the cutoff date to pass and verify no new data flows through the old connection.

  3. Create new Mirror Sync source:

    • Configure the new source with a complementary filter (for example, WHERE time >= {cutOffDate}).
    • Enable Mirror Sync with the mutation settings you want.
  4. Clean up: Remove the old source connection after verifying the new one works correctly.

Common issues

Events don't update

  • Verify that change tracking is enabled on source tables.
  • Check that events contain required user IDs.
  • Confirm sync frequency settings.

Missing deletions

  • Ensure DELETE operations are properly configured in your source.
  • Verify that deleted events had valid user IDs.
  • Check that change retention periods haven't expired.

Data inconsistencies

  • Review mutation operation ordering.
  • Verify that Amplitude disabled enrichment services as expected.
  • Check for timing issues between warehouse changes and sync execution.

Was this helpful?