Often, business needs dictate that you analyze behavioral data alongside other organizational sources of data that aren't captured within Amplitude. By integrating Amplitude with Amazon S3, you can export your Amplitude data to an Amazon S3 bucket. This enables you to analyze your Amplitude data sets side-by-side with the rest of your data.
The export works on a per-project basis, so you have the flexibility to set up data from one project for delivery to multiple buckets. Or, you can use multiple projects in the same organization to export event data into a single Amazon S3 bucket. Amplitude limits bucket access to a single organization.
To set up the Amazon S3 integration, follow these steps:
Amplitude verifies your bucket access. After access is verified, Amplitude immediately starts hourly exports.
After setup is complete, check the status of your exports from the integration.
You can backfill historical data to S3 by manually exporting data.
If the backfill range overlaps with the range of already exported data, Amplitude de-duplicates overlapping data.
To disable automatic exports, open the integration and click Manage. You can toggle exports from the Manage Export Settings modal.
Column |
Type | Description |
---|---|---|
$insert_id |
string | A unique identifier for the event. Amplitude deduplicates subsequent events sent with the same device_id and insert_id within the past 7 days. |
amplitude_attribution_ids |
array | The hashed attribution ids on the event. |
amplitude_id |
long | The original Amplitude ID for the user. Use this field to automatically handle merged users. Example: 2234540891 |
app |
int | The Project ID found in your project's Settings page. Example: 123456 |
city |
string | City. Example: “San Francisco” |
client_event_time |
timestamp | Local timestamp (UTC) of when the device logged the event. Example: 2015-08-10T12:00:00.000000 |
client_upload_time |
timestamp | The local timestamp (UTC) of when the device uploaded the event. Example: 2015-08-10T12:00:00.000000 |
country |
string | Country. Example: "United States" |
data |
dict | Dictionary where certain fields such as first_event and merged_amplitude_id are stored |
device_carrier |
string | Device Carrier. Example: Verizon |
device_family |
string | Device family. Example: Apple iPhone |
device_id |
string | The device specific identifier. Example: C8F9E604-F01A-4BD9-95C6-8E5357DF265D |
device_type |
string | Device type. Example: Apple iPhone 5s |
dma |
string | Designated marketing area (DMA). Example: San Francisco-Oakland-San Jose, CA |
event_id |
int | A counter that distinguishes events. Example: 1 |
event_properties |
dict | A dictionary of key-value pairs that represent data to send along with the event. You can store property values in an array |
event_time |
timestamp | Amplitude timestamp (UTC) which is the client_event_time adjusted by the difference between server_received_time and client_upload_time , specifically: event_time = client_event_time + (server_received_time - client_upload_time ) Amplitude uses this timestamp is used to organize events on Amplitude charts. NOTE: If the difference between server_received_time and client_upload_time is less than 60 seconds, the event_time isn't adjusted and equals the client_event_time . Example: 2015-08-10T12:00:00.000000 |
event_type |
string | Event type |
group_properties |
dict | A dictionary of key-value pairs that represent properties tied to the groups listed in the "groups" field. This feature is available to customers with Accounts add-on |
groups |
dict | Group types. See the Accounts documentation for more information |
idfa |
string | (iOS) Identifier for Advertiser |
ip_address |
string | IP address. Example: "123.11.111.11" |
language |
string | The language set by the user |
library |
string | Library used to send the event. Example: amplitude-ios/3.2.1 , http/1.0 |
location_lat |
float | Latitude. Example: 12.3456789 |
location_lng |
float | Longitude. Example: -123.4567890 |
os_name |
string | OS name. Example: ios |
os_version |
string | OS version. |
paying |
boolean | True if the user has ever logged any revenue, otherwise (none). Modify this property with the Identify API. |
platform |
string | Platform of the device |
processed_time |
timestamp | Amplitude timestamp when an event was processed by our processing systems |
region |
string | Region. Example: California |
sample_rate |
null | The number of samples taken. This feature is available to customers with Scale add-on |
server_received_time |
timestamp | Amplitude timestamp (UTC) when Amplitude's servers receive the event |
server_upload_time |
timestamp | Amplitude timestamp (UTC) when Amplitude's ingestion system ingests the event. Example: 2015-08-10T12:00:00.000000 |
session_id |
long | The session start time in milliseconds since epoch. Example: 1396381378123 |
start_version |
string | App version the user was first tracked on. Example: 1.0.0 |
user_id |
string | A readable ID specified by you. Should be something that doesn't change; for that reason, using the user's email address isn't recommended. |
user_properties |
dict | A dictionary of key-value pairs that represent data tied to the user. You can store property values in an array |
uuid |
UUID | A unique identifier per row (event sent). Example: bf0b9b2a-304d-11e6-934f-22000b56058f |
version_name |
string | The app version. Example: 1.0.0 |
Data is exported hourly as zipped archive JSON files, and partitioned by the hour with one or multiple files per hour. Each file contains one event JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example, server_upload_time
):
projectID_yyyy-MM-dd_H#partitionInteger.json.gz
For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 AM and 6 PM UTC, is in the file:
187520_2020-01-25_17#1.json.gz
Here is the exported data JSON object schema:
1{ 2 "server_received_time": UTC ISO-8601 timestamp, 3 "app": int, 4 "device_carrier": string, 5 "$schema":int, 6 "city": string, 7 "user_id": string, 8 "uuid": UUID, 9 "event_time": UTC ISO-8601 timestamp,10 "platform": string,11 "os_version": string,12 "amplitude_id": long,13 "processed_time": UTC ISO-8601 timestamp,14 "version_name": string,15 "ip_address": string,16 "paying": boolean,17 "dma": string,18 "group_properties": dict,19 "user_properties": dict,20 "client_upload_time": UTC ISO-8601 timestamp,21 "$insert_id": string,22 "event_type": string,23 "library":string,24 "amplitude_attribution_ids": string,25 "device_type": string,26 "device_manufacturer": string,27 "start_version": string,28 "location_lng": float,29 "server_upload_time": UTC ISO-8601 timestamp,30 "event_id": int,31 "location_lat": float,32 "os_name": string,33 "amplitude_event_type": string,34 "device_brand": string,35 "groups": dict,36 "event_properties": dict,37 "data": dict,38 "device_id": string,39 "language": string,40 "device_model": string,41 "country": string,42 "region": string,43 "is_attribution_event": bool,44 "adid": string,45 "session_id": long,46 "device_family": string,47 "sample_rate": null,48 "idfa": string,49 "client_event_time": UTC ISO-8601 timestamp,50 }
The size and volume of exported data depends on how you instrument data, and the number of events you send to Amplitude. Amplitude can't provide exact estimates, but you can use your average event size to provide a rough estimate:
Amplitude may label some files in your export as complete
. These labels help you decide if there is no data in the time frame or if the data in your time frame didn't export.
If you see a complete
file for a time frame with no data, there is no data to export for the selected time frame.
To disable complete
files, contact Amplitude Support.
Data is exported hourly as zipped archive JSON files. Each file contains one merged Amplitude ID JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example server_upload_time
):
-OrgID_yyyy-MM-dd_H.json.gz
For example, find data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, in the file:
-189524_2020-01-25_17.json.gz
Merged ID JSON objects have the following schema:
1{2 "scope": int,3 "merge_time": long,4 "merge_server_time": long,5 "amplitude_id": long,6 "merged_amplitude_id": long7}
The following outlines the procedure to enable KMS encryption in AWS S3 buckets for existing export connections. This encryption improves security posture.
Before starting the migration, users must have access to the following:
How to update existing export to use KMS encryption:
Next
to verify Bucket Access and create a new connection in the export setup flow.After you delete the old S3 export connection, the rollback process mentioned below is not applicable.
In case something goes wrong in the middle of the migration procedure mentioned above, do the following:
After disabling and removing the S3 export connection, the data already exported to the S3 destination bucket won’t be removed.
However, the metadata, like export jobs history, will be lost.
As long as the new S3 export connection is created within 24 hours after the old S3 export connection is disabled, there won’t be any gaps in data export.
In case the new export connection was created 24 hours after the old connection is disabled and there is a gap in data export, use manual backfill functionality to fill data for missing date range. Backfill export will ensure no duplication in exported data even if the backfill date range overlaps with the previously exported date range.
Less restricting access scope for your destination S3 bucket through (current state) trusting an AWS account root principal instead of (future state) AWS IAM Role principal. That AWS IAM Role is specific to Amplitude Project ID, in which the new S3 export connection is created.
No. Once the old S3 export connection is removed, it will no longer be possible to set up S3 export with the AWS account root principal as a trustee in the bucket policy.
Thanks for your feedback!
April 18th, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.