Google Cloud Storage
Export Amplitude event data and merged user data to your Google Cloud Storage (GCS) account. Google Cloud's bucket policies let you manage and programmatically export this data into a Google Cloud bucket. From the Amplitude UI, you can set up recurring syncs as often as once per hour.
Prerequisites
Create a GCS service account and set permissions before you configure the integration.
If you haven't already, create a service account for Amplitude within the Google Cloud console. This service account lets Amplitude export your data to your Google Cloud project.
After you create a service account, generate and download the service account key file and upload it to Amplitude. Export Amplitude's account key in JSON format.
Add this service account as a member to the bucket you want to export data to. Give this member the storage admin role so Amplitude has the permissions to export data to your bucket.
You can also create your own role. The export process requires these permissions at minimum:
storage.buckets.getstorage.objects.getstorage.objects.createstorage.objects.deletestorage.objects.list
Set up the integration
You need admin privileges in Amplitude, and a role that lets you enable resources in GCS.
- In Amplitude Data, click Catalog and select the Destinations tab.
- In the Warehouse Destination section, click Google Cloud Storage.
- On the Getting Started tab, select the data you want to export. You can Export events ingested today and moving forward, Export all merged Amplitude IDs, or both. For events, you can also specify filtering conditions to export only events that meet certain criteria.
You can export these two data types to separate buckets. Complete the setup flow twice: once for each data type.
- Review the Event table and Merge IDs table schemas and click Next.
- In the Google Cloud Credentials For Amplitude section, upload the Service Account Key file. This file must be in JSON format.
- After you upload the account service key, fill out the Google Cloud bucket details in the Google Cloud Bucket Details section.
- Click Next. Amplitude attempts a test upload to check that the credentials work. If the upload succeeds, click Finish to complete the GCS destination configuration and activation.
Amplitude automatically sends all future events and merged users to GCS. Amplitude exports files to your GCS account on a best-effort basis. Exports typically run hourly and contain one hour of data, but may run less frequently and contain multiple hours of data.
Run a manual export
Backfill historical data to GCS by manually exporting data.
- Go to the Google Cloud Storage export connection page.
- Go to the Backfills tab.
- Select the date range you want.
- Click Start Backfill.
If the backfill range overlaps with previously exported data, Amplitude de-duplicates overlapping data.
Exported data format
When you configure a GCS export, you specify a bucket name and an optional folder/prefix. This prefix determines where your exported data appears inside the bucket. If you leave the prefix empty, Amplitude writes objects directly at the bucket root.
Raw event file and data format
Amplitude exports data as a zipped archive of JSON files, partitioned by the hour with one or more files per hour. Each file contains one event JSON object per line.
Object key structure
Amplitude organizes exported event files under an {appId} directory that matches your project ID. The full object key structure is:
{gcsPrefix}/{appId}/{filename}
If you don't configure a prefix, the path simplifies to:
{appId}/{filename}
Where:
{gcsPrefix}is the optional folder/prefix you configure in the Amplitude UI.{appId}is your Amplitude project ID (the same ID that appears in the filename).{filename}follows the format below.
Filename format
File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the server_upload_time):
projectID_yyyy-MM-dd_H#partitionInteger.json.gz
For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:
187520_2020-01-25_17#1.json.gz
Example
If your bucket is amplitude-data, your prefix is events, and your project ID is 187520, the full GCS path for this file is:
gs://amplitude-data/events/187520/187520_2020-01-25_17#1.json.gz
Here is the exported data JSON object schema:
{
"server_received_time": UTC ISO-8601 timestamp,
"app": int,
"device_carrier": string,
"$schema":int,
"city": string,
"user_id": string,
"uuid": UUID,
"event_time": UTC ISO-8601 timestamp,
"platform": string,
"os_version": string,
"amplitude_id": long,
"processed_time": UTC ISO-8601 timestamp,
"version_name": string,
"ip_address": string,
"paying": boolean,
"dma": string,
"group_properties": dict,
"user_properties": dict,
"client_upload_time": UTC ISO-8601 timestamp,
"$insert_id": string,
"event_type": string,
"library":string,
"amplitude_attribution_ids": string,
"device_type": string,
"device_manufacturer": string,
"start_version": string,
"location_lng": float,
"server_upload_time": UTC ISO-8601 timestamp,
"event_id": int,
"location_lat": float,
"os_name": string,
"amplitude_event_type": string,
"device_brand": string,
"groups": dict,
"event_properties": dict,
"data": dict,
"device_id": string,
"language": string,
"device_model": string,
"country": string,
"region": string,
"is_attribution_event": bool,
"adid": string,
"session_id": long,
"device_family": string,
"sample_rate": null,
"idfa": string,
"client_event_time": UTC ISO-8601 timestamp,
}
Merged Amplitude IDs file and data format
Amplitude exports data as a zipped archive of JSON files. Each file contains one merged Amplitude ID JSON object per line.
Object key structure
Amplitude organizes merged ID files under a -{orgId} directory that matches your organization ID. The full object key structure is:
{gcsPrefix}/-{orgId}/{filename}
If you don't configure a prefix, the path simplifies to:
-{orgId}/{filename}
Where:
{gcsPrefix}is the optional folder/prefix you configure in the Amplitude UI.-{orgId}is your Amplitude organization ID with a leading hyphen.{filename}follows the format below.
Filename format
File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the server_upload_time):
-{orgId}_yyyy-MM-dd_H.json.gz
For example, for org ID 189524, Amplitude exports data received on Jan 25, 2020, between 5 PM and 6 PM UTC to the file:
-189524_2020-01-25_17.json.gz
Example
If your bucket is amplitude-data, your prefix is merged, and your org ID is 189524, the full GCS path for this file is:
gs://amplitude-data/merged/-189524/-189524_2020-01-25_17.json.gz
Legacy organizations
Some legacy organizations may see merged ID exports with your app ID instead of org ID. In this case, the directory and filename use {appId} without the leading -. Contact Amplitude Support to confirm which format applies to your organization.
Merged ID JSON objects have the following schema:
{
"scope": int,
"merge_time": long,
"merge_server_time": long,
"amplitude_id": long,
"merged_amplitude_id": long
}
Was this helpful?