On this page

Google BigQuery

Set up recurring syncs of your Amplitude event data to BigQuery through the Amplitude UI, or manually start a sync of your historical data. This article describes connecting your Amplitude and BigQuery accounts, then syncing your data.

Considerations

  • To help save query costs, Amplitude partitions the events table by the date of event_time. Refer to the BigQuery docs for more information about table partitioning.
  • Amplitude runs a daily integrity check for your exports. Each day, Amplitude checks the number of events exported over the last few days. If the count doesn't match the expected count, Amplitude automatically runs a backfill for that date.
  • Duplicate event exports can occur, especially during a backfill of already-exported data. To help with this, Amplitude creates a table function in the provided dataset named deduplicated_<appid>. This table function provides a deduplicated view of the data.

Deduplicate table function example

sql
SELECT * FROM `testProject.testDataset.deduplicated_EVENTS_12345`('2022-01-01', '2022-02-10');
  • Depending on your company's network policy, you may need to add the following IP addresses to your allowlist to allow Amplitude's servers to access your BigQuery instance:

    • Amplitude US IP addresses:
      • 52.33.3.219
      • 35.162.216.242
      • 52.27.10.221
    • Amplitude EU IP addresses:
      • 3.124.22.25
      • 18.157.59.125
      • 18.192.47.195

Prerequisites

To get started with exporting to BigQuery, you need the following:

After you create a service account, generate and download the service account key file and upload it to Amplitude. Export Amplitude's account key in JSON format.

Set up the integration

Creating a recurring data export is a three-step process. Each sync completes within ten minutes, and all jobs appear in Amplitude.

To set up a recurring export of your Amplitude data to BigQuery, follow these steps:

  1. In Amplitude Data, click Catalog and select the Destinations tab.
  2. In the Warehouse Destination section, click Big Query.
  3. On the Getting Started tab, select the data you want to export and the export's cadence. You can Export events ingested today and moving forward, Export all merged Amplitude IDs, or both. For events, you can also specify filtering conditions to export only events that meet certain criteria.
  4. Review the schemas for the Event table and the Merge IDs table and click Next.
  5. Specify the BigQuery dataset to receive your Amplitude data, and upload the service account key file. This file must be in JSON format.
  6. Click Next. Amplitude attempts a test upload to test the credentials. If the upload succeeds, click Finish to complete the BigQuery destination configuration and activation.

Amplitude sends future events and merged users to BigQuery automatically. Amplitude exports files to your BigQuery account on a best-effort basis. Exports typically run hourly and contain one hour of data, but may run less frequently and contain multiple hours of data.

Event table schema

The Event table schema includes the following columns:

Merged User table schema

The Merged User table schema contains the following:

BigQuery deduplication table

To create a deduplication table in BigQuery:

sql
CREATE OR REPLACE TABLE FUNCTION `amplitude_bq_ingestion`.deduplicated_EVENTS(start_date DATE, end_date DATE)
AS
SELECT *
FROM (SELECT *,
             ROW_NUMBER() OVER (
                 PARTITION BY uuid
                 ) rn
      FROM `amplitude_bq_ingestion`.`EVENTS`
      WHERE DATE(event_time) >= start_date
        and DATE(event_time) <= end_date) t
WHERE rn = 1;

Was this helpful?