Amplitude users can now export Amplitude event data and merged user data to their Google Cloud Storage (GCS) account. Google Cloud's bucket policies allow you to manage and programmatically export this data into a Google Cloud bucket. Using the Amplitude UI, you can set up recurring syncs as often as once per hour.
If you haven't already, create a service account for Amplitude within the Google Cloud console. This allows Amplitude to export your data to your Google Cloud project.
After you create a service account, generate and download the service account key file and upload it to Amplitude. Make sure you export Amplitude's account key in JSON format.
Add this service account as a member to the bucket you'd like to export data to. Give this member the storage admin role to make sure Amplitude has the necessary permissions to export the data to your bucket.
You can also create your own role, if you prefer.
Keep in mind that the export process requires, at a minimum, the following permissions:
storage.buckets.get
storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list
To set up a recurring export of your Amplitude data to GCS, follow these steps:
You need admin privileges in Amplitude, as well as a role that allows you to enable resources in GCS.
You can export these two different data types to separate buckets. Complete the setup flow twice: once for each data type.
All future events/merged users are automatically sent to GCS. Amplitude exports files to your GCS account every hour.
You can backfill historical data to GCS by manually exporting data.
If the backfill range overlaps with the range of previously exported data, Amplitude will de-duplicate overlapping data.
Data is exported hourly as zipped archive JSON files, and partitioned by the hour with one or multiple files per hour. Each file contains one event JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example, server_upload_time
):
projectID_yyyy-MM-dd_H#partitionInteger.json.gz
For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:
187520_2020-01-25_17#1.json.gz
Here is the exported data JSON object schema:
1{ 2 "server_received_time": UTC ISO-8601 timestamp, 3 "app": int, 4 "device_carrier": string, 5 "$schema":int, 6 "city": string, 7 "user_id": string, 8 "uuid": UUID, 9 "event_time": UTC ISO-8601 timestamp,10 "platform": string,11 "os_version": string,12 "amplitude_id": long,13 "processed_time": UTC ISO-8601 timestamp,14 "user_creation_time": UTC ISO-8601 timestamp,15 "version_name": string,16 "ip_address": string,17 "paying": boolean,18 "dma": string,19 "group_properties": dict,20 "user_properties": dict,21 "client_upload_time": UTC ISO-8601 timestamp,22 "$insert_id": string,23 "event_type": string,24 "library":string,25 "amplitude_attribution_ids": string,26 "device_type": string,27 "device_manufacturer": string,28 "start_version": string,29 "location_lng": float,30 "server_upload_time": UTC ISO-8601 timestamp,31 "event_id": int,32 "location_lat": float,33 "os_name": string,34 "amplitude_event_type": string,35 "device_brand": string,36 "groups": dict,37 "event_properties": dict,38 "data": dict,39 "device_id": string,40 "language": string,41 "device_model": string,42 "country": string,43 "region": string,44 "is_attribution_event": bool,45 "adid": string,46 "session_id": long,47 "device_family": string,48 "sample_rate": null,49 "idfa": string,50 "client_event_time": UTC ISO-8601 timestamp,51 }
Data is exported hourly as zipped archive JSON files. Each file contains one merged Amplitude ID JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example server_upload_time
):
-OrgID_yyyy-MM-dd_H.json.gz
For example, data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:
-189524_2020-01-25_17.json.gz
Merged ID JSON objects have the following schema:
1{2 "scope": int,3 "merge_time": long,4 "merge_server_time": long,5 "amplitude_id": long,6 "merged_amplitude_id": long7}
Thanks for your feedback!
April 22nd, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.