Amplitude's GCS Import feature lets you import event or user properties into your Amplitude projects from an GCS bucket. This article helps you configure this data source within Amplitude.
Before you start, make sure you’ve taken care of some prerequisites.
If you haven't already, create a service account for Amplitude within the Google Cloud console. This allows Amplitude to export your data to your Google Cloud project.
After you create a service account, generate and download the service account key file and upload it to Amplitude. Make sure you export Amplitude's account key in JSON format.
Add this service account as a member to the bucket you'd like to export data to. Make sure to give this member the storage admin role to make sure Amplitude has the necessary permissions to export the data to your bucket.
You can also create your own role, if you prefer.
Keep in mind that the export process requires, at a minimum, the following permissions:
storage.buckets.get
storage.objects.get
storage.objects.create
storage.objects.delete
storage.objects.list
To add a new GCS data source for Amplitude to draw data from, follow these steps:
The final step in setting up Amplitude's GCS ingestion source is creating the converter file. Your converter configuration gives the integration this information:
The converter file tells Amplitude how to process the ingested files. Create it in two steps: first, configure the compression type, file name, and escape characters for your files.
Then use JSON to describe the rules your converter follows.
You can create converters via Amplitude's new guided converter creation interface. This lets you map and transform fields visually, removing the need to manually write a JSON configuration file. Behind the scenes, the UI compiles down to the existing JSON configuration language used at Amplitude.
First, take a look at the different data types you can import: Event, User Property and Group Property data.
Amplitude recommends selecting preview in step 1 of the Data Converter, where you see a sample source record before moving to the next step.
After you have selected a particular field, you can choose to transform the field in your database. You can do this by clicking on "Transform" and choosing the kind of transformation you would like to apply. You can find a short description for each transformation.
After you select a field, you can open the transformation modal and choose from a variety of Transformations.
Depending on the transformation you select, you may be prompted to include more fields.
After you have all the fields needed for the transformation, you can save it. You can update these if your requirements change.
Although Amplitude needs certain fields to bring data in, it also supports extra fields which you can include by clicking the “Add Mapping” button. Here, Amplitude supports 4 kinds of mappings: Event properties, User Properties, Group Properties and Additional Properties.
Find a list of supported fields for events in the HTTP V2 API documentation and for user properties in the Identify API documentation. Add any columns not in those lists to either event_properties
or user_properties
, otherwise it's ignored.
After you have added all the fields you wish to bring into Amplitude, you can view samples of this configuration in the Data Preview section. Data Preview automatically updates as you include or remove fields and properties. In Data Preview, you can look at a few sample records based on the source records along with how that data is imported into Amplitude. This ensures that you are bringing in all the data points you need into Amplitude. You can look at 10 different sample source records and their corresponding Amplitude events.
The converter language describes extraction of a value given a JSON element. Specify this using a SOURCE_DESCRIPTION, which includes:
See the Converter Configuration reference for more help.
If you add new fields or change the source data format, you need to update your converter configuration. Note that the updated converter only applies to files discovered_after_converter
updates are saved.
After the initial ingestion, your data organization must conform to this standard for subsequent imports:
{bucket name}/{GCSPrefix}/{YYYY}/{MM}/{DD}/{HH}/{optional}/{additional}/ {folder}/{structure}/{file name}
where:
{bucket name}
is the name of your GCS bucket;{GCSPrefix}
is the source prefix folder specified in your source setup configuration;{YYYY}/{MM}/{DD}/{HH}
is the required date prefix format to upload new files. You should organize files according to the time they're uploaded to the bucket, and not when the files are generated in your system. Also, you must always use two digits (as opposed to one) to represent the month, day, and hour;{optional}/{additional}/{folder}/{structure}
is where you can add additional folder structure details. These details are strictly optional. If you do include them, an example file path might look like {bucket name}/{GCSPrefix}/{YYYY}/{MM}/{DD}/{HH}/**cluster-01/node-25**/{file name}
.These organizational requirements apply only to new data you want to import after the source is enabled. You don't have to reorganize any pre-existing files, as Amplitude's GCS Import captures the data they contain on the first ingestion scan. After the initial scan, new data uploaded to the bucket must conform to the requirements outlined here.
Thanks for your feedback!
April 22nd, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.