With Amplitude’s Amazon S3 Import, you can import event, group properties, or user properties into your Amplitude projects from an AWS S3 bucket. Use Amazon S3 Import to backfill large amounts of existing data, connect existing data pipelines to Amplitude, and ingest large volumes of data where you need high throughput and latency is less sensitive.
During setup, you configure conversion rules to control how events are instrumented. After Amazon S3 Import is set up and enabled, Amplitude's ingestion service continuously discovers data files in S3 buckets and then converts and ingest events.
Amazon S3 Import setup has four main phases:
Before you start, make sure you’ve taken care of some prerequisites.
Before you can ingest data, review your dataset and consider best practices. Make sure your dataset contains the data you want to ingest, and any required fields.
The files you want to send to Amplitude must follow some basic requirements:
For each Amplitude project, AWS S3 import can ingest:
If your network policy requires, add the following IP addresses to your allowlist to enable Amplitude to access your buckets:
insert_id
Amplitude uses a unique identifier, insert_id
, to match against incoming events and prevent duplicates. If within the same project, Amplitude receives an event with insert_id
and device_id
values that match the insert_id
and device_id
of a different event received within the last 7 days, Amplitude drops the most recent event.
Amplitude highly recommends that you set a custom insert_id
for each event to prevent duplication. To set a custom insert_id
, create a field that holds unique values, like random alphanumeric strings, in your dataset. Map the field as an extra property named insert_id
in the guided converter configuration.
When your dataset is ready for ingestion, you can set up Amazon S3 Import in Amplitude.
Follow these steps to give Amplitude read access to your AWS S3 bucket.
Create a new IAM role, for example: AmplitudeReadRole
.
Go to Trust Relationships for the role and add Amplitude’s account to the trust relationship policy to allow Amplitude to assume the role using the following example.
amplitude_account
: 358203115967
for Amplitude US data center. 202493300829
for Amplitude EU data center.external_id
: unique identifiers used when Amplitude assumes the role. You can generate it with help from third party tools. Example external id can be vzup2dfp-5gj9-8gxh-5294-sd9wsncks7dc
. 1{ 2 "Version": "2012-10-17", 3 "Statement": [ 4 { 5 "Effect": "Allow", 6 "Principal": { 7 "AWS": "arn:aws:iam::<amplitude_account>:root" 8 }, 9 "Action": "sts:AssumeRole",10 "Condition": {11 "StringEquals": {12 "sts:ExternalId": "<external_id>" 13 }14 }15 }16 ]17}
Create a new IAM policy, for example, AmplitudeS3ReadOnlyAccess
. Use the entire example code that follows, but be sure to update <> in highlighted text.
filePrefix
. For folders, make sure prefix ends with /
, for example folder/
. For the root folder, keep prefix as empty.Example 1: IAM policy without prefix:
1{ 2 "Version":"2012-10-17", 3 "Statement":[ 4 { 5 "Sid":"AllowListingOfDataFolder", 6 "Action":[ 7 "s3:ListBucket" 8 ], 9 "Effect":"Allow",10 "Resource":[11 "arn:aws:s3:::<bucket_name>"12 ],13 "Condition":{14 "StringLike":{15 "s3:prefix":[16 "*" 17 ]18 }19 }20 },21 {22 "Sid":"AllowAllS3ReadActionsInDataFolder",23 "Effect":"Allow",24 "Action":[25 "s3:GetObject",26 "s3:ListBucket"27 ],28 "Resource":[29 "arn:aws:s3:::<bucket_name>/*" 30 ]31 },32 {33 "Sid":"AllowUpdateS3EventNotification",34 "Effect":"Allow",35 "Action":[36 "s3:PutBucketNotification",37 "s3:GetBucketNotification"38 ],39 "Resource":[40 "arn:aws:s3:::<bucket_name>" 41 ]42 }43 ]44}
Example 2: IAM policy with a prefix. For a folder, make sure the prefix ends with /
, for example folder/
:
1{ 2 "Version":"2012-10-17", 3 "Statement":[ 4 { 5 "Sid":"AllowListingOfDataFolder", 6 "Action":[ 7 "s3:ListBucket" 8 ], 9 "Effect":"Allow",10 "Resource":[11 "arn:aws:s3:::<bucket_name>"12 ],13 "Condition":{14 "StringLike":{15 "s3:prefix":[16 "<prefix>*" 17 ]18 }19 }20 },21 {22 "Sid":"AllowAllS3ReadActionsInDataFolder",23 "Effect":"Allow",24 "Action":[25 "s3:GetObject",26 "s3:ListBucket"27 ],28 "Resource":[29 "arn:aws:s3:::<bucket_name>/<prefix>*" 30 ]31 },32 {33 "Sid":"AllowUpdateS3EventNotification",34 "Effect":"Allow",35 "Action":[36 "s3:PutBucketNotification",37 "s3:GetBucketNotification"38 ],39 "Resource":[40 "arn:aws:s3:::<bucket_name>" 41 ]42 }43 ]44}
Go to Permissions for the role. Attach the policy created in step3 to the role.
In Amplitude, create the S3 Import source.
Amplitude recommends that you create a test project or development environment for each production project to test your instrumentation.
To create the data source in Amplitude, gather information about your S3 bucket:
When you have your bucket details, create the Amazon S3 Import source.
In Amplitude Data, click Catalog and select the Sources tab.
In the Warehouse Sources section, click Amazon S3.
Select Amazon S3, then click Next. If this source doesn’t appear in the list, contact your Amplitude Solutions Architect.
Complete the Configure S3 location section on the Set up S3 Bucket page:
com-amplitude-vacuum-<customername>.
This tells Amplitude where to look for your files.Optional: enable S3 Event Notification. See Manage Event Notifications for more information.
Click Test Credentials after you’ve filled out all the values. You can’t edit these values from the UI after you create the source, so make sure that all the info is correct before clicking Next.
From the Enable Data Source page, enter a Data Source Name and a Description (optional) and save your source. You can edit these details from Settings.
A banner confirms you’ve created and enabled your source. Click Finish to go back to the list of data sources. Next, you must create your converter configuration.
Amplitude continuously scans buckets to discover new files as they're added. Data is available in charts within 30 seconds of ingestion.
Event Notification lets the Amplitude ingestion service discover data in your S3 bucket faster. Compared to the current approach of scanning buckets, it discovers new data based on notifications published by S3. This feature reduces the time it takes to find new data.
Use this feature if you want to achieve near real-time import with Amplitude Amazon S3 import. Usually, Amplitude discovers new data files within 30 seconds.
To enable the feature, you can either enable it when you create the source, or manage the data source and toggle S3 Event Notification.
Your converter configuration gives the S3 vacuum this information:
You can create converters via Amplitude's new guided converter creation interface. This lets you map and transform fields visually, removing the need to manually write a JSON configuration file. Behind the scenes, the UI compiles down to the existing JSON configuration language used at Amplitude.
First, note the different data types you can import: Event, User Property and Group Property data.
Amplitude recommends selecting preview in step 1 of the Data Converter, where you see a sample source record before moving to the next step.
After you have selected a particular field, you can choose to transform the field in your database. You can do this by clicking Transform and choosing the kind of transformation you would like to apply. You can find a short description for each transformation.
After you select a field, you can open the transformation modal and choose from a variety of Transformations.
Depending on the transformation you select, you may need to include more fields.
After you have all the fields needed for the transformation, you can save it. You can update these fields as needed when your requirements change.
You can include more fields by clicking the Add Mapping button. Here Amplitude supports 4 kinds of mappings: Event properties, User Properties, Group Properties and Additional Properties.
Find a list of supported fields for events in the HTTP V2 API documentation and for user properties in the Identify API documentation. Add any columns not in those lists to either event_properties
or user_properties
, otherwise it's ignored.
After you have added all the fields you wish to bring into Amplitude, you can view samples of this configuration in the Data Preview section. Data Preview automatically updates as you include or remove fields and properties. In Data Preview, you can look at a few sample records based on the source records along with how that data is imported into Amplitude. This ensures that you are bringing in all the data points you need into Amplitude. You can look at 10 different sample source records and their corresponding Amplitude events.
The group properties import feature requires that groups are set in the HTTP API event format. The converter expects a groups
object and a group_properties
object.
The converter file tells Amplitude how to process the ingested files. Create it in two steps: first, configure the compression type, file name, and escape characters for your files.
Then use JSON to describe the rules your converter follows.
The converter language describes extraction of a value given a JSON element. You specify this with a SOURCE_DESCRIPTION, which includes:
See the Converter Configuration reference for more help.
If you add new fields or change the source data format, you need to update your converter configuration. Note that the updated converter only applies to files discovered_after_converter
updates are saved.
After you’ve created the S3 Import source and the converter configuration, you must enable the source to begin importing data.
To enable the source:
Thanks for your feedback!
April 22nd, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.