On this page

BigQuery Data Import

With Amplitude's BigQuery integration, you can ingest BigQuery data directly into your Amplitude project.

BigQuery Import for Google Analytics 4

Amplitude supports a version of BigQuery Import specifically for GA4. For more information, refer to Google Analytics 4 Import.

Prerequisites

To start importing from BigQuery, complete the following prerequisites.

  • You need a table (or tables) in BigQuery to import data from.

  • Create a GCS bucket. Amplitude recommends one dedicated to this purpose. The ingestion process must offload data to a GCS bucket before ingesting it into Amplitude, because of BigQuery's limited export options.

  • Create a Service Account with permissions granted for the bucket and tables you want to ingest, then get the service account key. Grant the Service Account the following roles:

    • BigQuery:
      • BigQuery Job User at the project level.
      • BigQuery Data Viewer at the resource level necessary to access your data. If your data is in a single table, grant the service account BigQuery Data Viewer for that table resource. If the query requires multiple tables or datasets, grant BigQuery Data Viewer on all tables or datasets in the query.
    • Cloud Storage:
      • Storage Admin on the GCS bucket you're using for ingestion.
  • Depending on your company's network policy, you may need to add the following IP addresses to your allowlist to let Amplitude's servers access your BigQuery instance:

    • Amplitude US IP addresses:
      • 52.33.3.219
      • 35.162.216.242
      • 52.27.10.221
    • Amplitude EU IP addresses:
      • 3.124.22.25
      • 18.157.59.125
      • 18.192.47.195

User and Group properties sync

Amplitude's Data Warehouse Import sometimes processes events in parallel, so time-ordered syncing of user and group properties on events isn't guaranteed in the same way as submitting events directly to the Identify and Group Identify APIs.

Add BigQuery as a source

To add BigQuery as a data source in your Amplitude project, follow these steps.

  1. In Amplitude Data, click Catalog and select the Sources tab.
  2. In the Warehouse Sources section, click BigQuery.
  3. Add the service account key and specify a GCS bucket name.
  4. Click Next to test the connection.
  5. After you confirm your credentials, click Next to select data. Choose from these configuration options:
    • Type of data: tells Amplitude whether you're ingesting event data, user property, or group property data.
    • Type of import:
      • Full Sync: Amplitude periodically imports the entire dataset, regardless of whether the data is already imported. This works well for data sets where the row data changes over time but there's no easy way to tell which rows have changed. Otherwise, the more efficient option is a time-based import. This option doesn't support ingesting event data.
      • Time-based: Amplitude periodically ingests the most recent rows in the data, as determined by the provided Timestamp column. The first import ingests all available data, and later imports ingest any data with timestamps after the most recent import. To use this option, include the timestamp column in the output of your SQL statement.
    • Frequency: choose from several scheduling options ranging from five minutes to one month. Daily syncs can run at a specific hour in the day. Weekly and Monthly syncs can run at a specific day and hour.
    • SQL query: the code for the query Amplitude uses to ingest the right data.
  6. After you set your configuration options, click Test SQL to see how the data comes through from your BigQuery instance. Any errors appear under the Test SQL button.
  7. If there are no errors, click Finish. You get a notification confirming the new BigQuery source. Amplitude then redirects you to the Sources listing page, where you can see the new BigQuery source.

If you have issues or questions while following this flow, contact the Amplitude team.

Time-based import

For Amplitude's time-based import option, use a monotonically increasing timestamp value as a best practice. This value should indicate when the record loaded into the source table the SQL configuration is querying from (often referred to as a "server upload time"). The warehouse import tool brings data into Amplitude by continually updating the maximum value of the column referenced in the Timestamp Column Name input within the Import Config UI with each subsequent import.

On first import, Amplitude imports all the data returned from the query configured in the Import Config. Amplitude saves a reference of the maximum timestamp referenced in the Timestamp Column Name: timestamp_1. On subsequent import, Amplitude imports all data from the previously saved timestamp (timestamp_1) to the new maximum timestamp (timestamp_2). After that import, Amplitude saves timestamp_2 as the new maximum timestamp.

BigQuery export statement limitations

BigQuery export statements can't reference meta tables in queries. This includes INFORMATION_SCHEMA views, system tables, or wildcard tables. If your query references any of these meta tables, Amplitude reports an error like: EXPORT DATA statement cannot reference meta tables in the queries.

To avoid this limitation:

  • Ensure your SQL query only references standard tables and views.
  • Don't include references to INFORMATION_SCHEMA views.
  • Avoid using system tables in your query.
  • Don't use wildcard tables in your import queries.

For more information about BigQuery export statement limitations, refer to Google's article Export statements in GoogleSQ.

Mandatory data fields

Include the mandatory fields for the data type when you create the SQL query. These tables outline the mandatory and optional fields for each data type. Find a list of other supported fields for events in the HTTP V2 API documentation and for user properties in the Identify API documentation. Add any columns not in those lists to either event_properties or user_properties, otherwise Amplitude ignores them.

Events

Find other supported fields in the HTTP V2 API documentation.

User properties

Find other supported fields in the Identify API documentation.

Group properties

Each group property in group_properties applies to every group in groups

View job details

The job details drawer is only available for BigQuery Import.

Job states

Each job appears in one of four states:

Pipeline stages

The drawer visualizes the job's progress through three stages:

  1. Upload: Amplitude extracts data from BigQuery and offloads it to Cloud Storage.
  2. Processing: Amplitude prepares and batches the data for ingestion.
  3. Completion: Amplitude ingests the batches into your project.

Each stage shows one of four statuses: Pending, Running, Completed, or Failed. Amplitude highlights the active stage. All three stages are always visible, regardless of job state.

Ingestion statistics

For completed or partially completed jobs, the drawer shows:

  • Uploaded: Total records extracted from BigQuery.
  • Ingested: Records Amplitude added to your project.
  • Not ingested: Records Amplitude didn't ingest.
  • Duplicates: Records Amplitude deduplicated.
  • Non-compliant: Records that failed validation.
  • % ingested: The ratio of ingested records to uploaded records.

Use the arrow controls in the drawer to move between jobs, or enter a page number directly to jump to a specific page. The drawer is available for Temporal jobs on any page of the jobs list. Pagination resets when you change a filter or toggle the Show empty jobs option.

Update your BigQuery service account key

To update the Service Account used for your BigQuery Source, select the existing BigQuery Source within the Sources section of Data and click the gear icon. In the modal, upload the new Service Account Key you want Amplitude to use going forward.

Service account data access

Before you update your Service Account Key, ensure the new Service Account Key has the proper data access so Amplitude can successfully import any relevant data.

BigQuery SQL helper

Properties fields

Many Amplitude features rely on "properties" fields, which consist of property keys and property values. The most common of these properties fields are event_properties and user_properties.

For Amplitude to correctly ingest these sets of keys and values, BigQuery must export them as raw JSON, not as JSON strings. BigQuery doesn't have great support for JSON, but the following describes how to make sure your data is exported from BigQuery and imported to Amplitude without errors.

The properties fields come from columns with a STRUCT type. The struct type represents a key-value structure and is exported from BigQuery in raw JSON format.

If your source table doesn't have the event or user properties organized in a struct type column, you can create it in your select SQL. For example, if your event properties are flattened into their own columns, you can compose your event_properties into a struct like this:

sql
SELECT STRUCT(
    event_property_column_1 AS event_property_name_1,
    event_property_column_2 AS event_property_name_2
) as event_properties
FROM your_table;

You can't have spaces in struct field names even if they're enclosed in back ticks or single quotes.

Reconstruct event or user properties from RECORD type and REPEATED mode fields

If you have event_properties or user_properties fields with RECORD type and REPEATED mode, you may need to transform them into a valid format before ingesting them into Amplitude.

Here are two approaches to achieve this transformation:

  • Reconstruct the properties into a JSON object using PARSE_JSON, ensuring all values are properly formatted:
plaintext
PARSE_JSON(CONCAT('{',
    (
      SELECT STRING_AGG(
          CONCAT('"', key, '":"',
            COALESCE(
              NULLIF(CODE_POINTS_TO_STRING(
                ARRAY((
                  SELECT * FROM UNNEST((
                    SELECT TO_CODE_POINTS(CAST(value.string_value AS STRING))
                  )) AS code_points
                  WHERE code_points > 31 AND code_points != 34 AND code_points != 92
                ))
              ), ''),
              NULLIF(CAST(value.int_value AS STRING), ''),
              NULLIF(CAST(value.float_value AS STRING), ''), 
              NULLIF(CAST(value.double_value AS STRING), '')
            ),
            '"'
          )
      ) FROM UNNEST(event_properties)
    ),
    '}'
  )) AS event_properties
  • Extract individual key-value pairs directly:
sql
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'key1') AS key1,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'key2') AS key2

After you transform the event properties, format them as a STRUCT for Amplitude to ingest:

sql
STRUCT
(
  action AS action,
  field AS field
) AS event_properties

Properties from a JSON string field

If you have your event or user properties formatted as JSON in a string field, you still must reconstruct the properties field in the select SQL as a STRUCT. BigQuery exports String fields as String even if the contents are JSON. Amplitude's event validation rejects these.

You can extract values from your JSON String field to use in your properties STRUCT. Use the JSON_EXTRACT_SCALAR function to access the values in your string as follows. If your EVENT_PROPERTIES column in the table contains a JSON String like:

"{\"record count\":\"50\",\"region\":\"eu-central-1\"}" which is shown in the BigQuery UI like {"record count":"50","region":"eu-central-1"}), then you can extract the values from the JSON String like this:

sql
SELECT STRUCT(
    JSON_EXTRACT_SCALAR(EVENT_PROPERTIES, "$.record count") AS record_count,
   JSON_EXTRACT_SCALAR(EVENT_PROPERTIES, "$.region") AS region
) as event_properties
FROM your_table;

String literals

Unlike other data warehouse products, BigQuery treats "double-quoted strings" as string literals. You can't use them to quote identifiers like column names or table names, or the SQL fails to execute in BigQuery.

Was this helpful?