Amplitude's Azure Blob Storage destination enables you to export Amplitude event data and merged user data to your Azure Blob Storage container. You can use Azure Blob Storage's container policies to manage and export this data programmatically into an Azure container. Amplitude allows recurring syncs as often as once per hour.
Before you can export data from Amplitude to Azure Blob Storage, ensure your Azure environment meets the following prerequisites:
Set Up Azure Storage Account and Container:
If you haven't already set up an Azure Storage Account and a Blob Storage container, follow these guides:
Create an Azure Service Principal:
Create a service principal in Azure for Amplitude to use to access your Blob Storage. Copy the tenantId
, clientId
, and clientSecret
, Amplitude requires these details to set up the connection with Azure. Follow this guide to create a service principal:
Grant Required Permissions to Service Principal:
Assign the necessary permissions for Azure Blob Storage to your service principal:
read
to ensure data isn't exported more than once for recurring exports.
delete
to enable deduplication during a manual export, for example when you export backfill data.
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete
For instructions on assigning these permissions for Azure Blob Storage, refer to:
Configure Network Access:
Ensure the Firewalls and Virtual Networks settings of your Azure Storage account allow access from the following Amplitude IP addresses:
52.33.3.219
35.162.216.242
52.27.10.221
3.124.22.25
18.157.59.125
18.192.47.195
For guidance on modifying these settings, see:
Once complete, these steps ensure your Azure environment is ready to receive secure data exports from Amplitude.
To export your Amplitude data to Azure Blob Storage:
You can export these two different data types to separate containers, if you prefer. You'll just need to complete the setup flow twice: once for each data type.
tenantId
, clientId
, and clientSecret
from the service principal you created.When complete, Amplitude sends all future events/merged users to your Azure Blob Storage once per hour.
Complete a manual export to backfill and send historical data to Azure Blob Storage.
If the backfill date range overlaps with the date range of already exported data, Amplitude de-duplicates any overlapping data.
Amplitude exports data hourly as a zipped archive of JSON files, partitioned by the hour with one or more files per hour. Each file contains one event JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example, server_upload_time
):
projectID_yyyy-MM-dd_H#partitionInteger.json.gz
For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:
187520_2020-01-25_17#1.json.gz
The exported data JSON object schema is:
1{ 2 "server_received_time": UTC ISO-8601 timestamp, 3 "app": int, 4 "device_carrier": string, 5 "$schema":int, 6 "city": string, 7 "user_id": string, 8 "uuid": UUID, 9 "event_time": UTC ISO-8601 timestamp,10 "platform": string,11 "os_version": string,12 "amplitude_id": long,13 "processed_time": UTC ISO-8601 timestamp,14 "version_name": string,15 "ip_address": string,16 "paying": boolean,17 "dma": string,18 "group_properties": dict,19 "user_properties": dict,20 "client_upload_time": UTC ISO-8601 timestamp,21 "$insert_id": string,22 "event_type": string,23 "library":string,24 "amplitude_attribution_ids": string,25 "device_type": string,26 "device_manufacturer": string,27 "start_version": string,28 "location_lng": float,29 "server_upload_time": UTC ISO-8601 timestamp,30 "event_id": int,31 "location_lat": float,32 "os_name": string,33 "amplitude_event_type": string,34 "device_brand": string,35 "groups": dict,36 "event_properties": dict,37 "data": dict,38 "device_id": string,39 "language": string,40 "device_model": string,41 "country": string,42 "region": string,43 "is_attribution_event": bool,44 "adid": string,45 "session_id": long,46 "device_family": string,47 "sample_rate": null,48 "idfa": string,49 "client_event_time": UTC ISO-8601 timestamp,50}
Amplitude exports data as a zipped archive of JSON files. Each file contains one merged Amplitude ID JSON object per line.
File names have the following syntax, where the time represents when the data was uploaded to Amplitude servers in UTC (for example server_upload_time
):
-OrgID_yyyy-MM-dd_H.json.gz
For example, data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:
-189524_2020-01-25_17.json.gz
The exported data JSON object schema is:
1{2 "scope": int,3 "merge_time": long,4 "merge_server_time": long,5 "amplitude_id": long,6 "merged_amplitude_id": long7}
Thanks for your feedback!
June 21st, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.