The California Consumer Privacy Act (CCPA) requires businesses to provide all data about an end user upon request. This Data Subject Access Request (DSAR) API makes it easy to retrieve all data about a user.
This API uses basic authentication, using the API key and secret key for your project. Pass base64-encoded credentials in the request header like {api-key}:{secret-key}
. api-key
replaces username, and secret-key
replaces the password.
Your authorization header should look something like this:
--header 'Authorization: Basic YWhhbWwsdG9uQGFwaWdlZS5jb206bClwYXNzdzByZAo'`
For more information, see Find your API Credentials
Region | Endpoint |
---|---|
Standard server | https://amplitude.com/api/2/dsar/requests |
To support data volume, this API works asynchronously. Getting user data happens in three steps:
requestId
.requestId
to check the status of the job.Each file is gzipped, and the contents adhere to the following rules:
Example Output
1{"amplitude_id":123456789,"app":12345,"event_time":"2020-02-15 01:00:00.123456","event_type":"first_event","server_upload_time":"2020-02-18 01:00:00.234567"}2{"amplitude_id":123456789,"app":12345,"event_time":"2020-02-15 01:00:11.345678","event_type":"second_event","server_upload_time":"2020-02-18 01:00:11.456789"}3{"amplitude_id":123456789,"app":12345,"event_time":"2020-02-15 01:02:00.123456","event_type":"third_event","server_upload_time":"2020-02-18 01:02:00.234567"}
All DSAR endpoints share a budget of 14.4 K “cost” per hour. POST requests cost 8, and GET requests cost 1. Requests beyond this count get 429 response codes.
In general for each POST, there is typically one output file per month per project the user has events for.
For example, if you are fetching 13 months of data for a user with data in two projects, expect about 26 files.
If you need to get data for 40 users per hour, you can spend 14400 / 40 = 360
cost per request. Conservatively allocating 52 GETs for output files (twice the computed amount) and 8 for the initial POST, you can poll for the status of the request 360 - 8 - 52 = 300
times.
Given the 3 day SLA for results (4,320 minutes), this allows for checking the status every 4320 / 300 ~= 15
minutes over 3 days.
A practical use might be to have a service which runs every 20 minutes, posting 20 new requests and checking on the status of all outstanding requests.
1base_url = 'https://amplitude.com/api/2/dsar/requests' 2payload = { 3 "amplitudeId": AMPLITUDE_ID, 4 "startDate": "2019-03-01", 5 "endDate": "2020-04-01" 6} 7headers = { 8 'Accept': 'application/json', 9 'Content-Type': 'application/json'10}11auth = HTTPBasicAuth(API_KEY, SECRET_KEY)12r = requests.post(base_url, headers=headers, auth=auth, data=payload)13request_id = r.json().get('requestId')14time.sleep(POLL_DELAY)15while (True):16 r = requests.get(f'{base_url}/{request_id}', auth=auth, headers=headers)17 response = r.json()18 if response.get('status') == 'failed':19 sys.exit(1)20 if response.get('status') == 'done':21 break22 time.sleep(POLL_INTERVAL)23for url in response.get('urls'):24 r = requests.get(url, headers=headers, auth=auth, allow_redirects=True)25 index = url.split('/')[-1]26 filename = f'{AMPLITUDE_ID}-{index}.gz'27 with open(f'{OUTPUT_DIR}/{filename}','wb') as f:28 f.write(r.content)
1curl --location --request POST 'https://amplitude.com/api/2/dsar/requests' \2--header 'Accept: application/json' \3--header 'Content-Type: application/json' \4-u '{org-api-key}:{org-secret_key}' \5--data-raw '{6"userId": 12345,7"startDate": "2020-04-24",8"endDate": "2022-02-20"9}'
1POST /api/2/dsar/requests HTTP/1.1 2Host: amplitude.com 3Accept: application/json 4Content-Type: application/json 5Authorization: Basic {org-api-key}:{org-secret_key} # credentials must be base64 encoded 6{ 7 "userId": 12345, 8 "startDate": "2020-04-24", 9 "endDate": "2022-02-20"10}
This example creates a request by user ID
Example: Create a request by user ID
12345
, between the dates of April 24, 2020 and February 20, 2022.
This example creates a request by Amplitude ID
Example: Create a request by Amplitude ID
90102919293
, between the dates of April 24, 2020 and February 20, 2022.
Name | Description |
---|---|
userId |
Required if amplitudeID isn't set. The user ID of the user to request data for. |
amplitudeId |
Required if userID isn't set. Integer. The Amplitude ID of the user to request data for. |
startDate |
Required. Date. The start date for the data request. |
endDate |
Required. Date. The end date for the data request. |
When successful, the call returns a 202 Accepted
response and requestID
. Use the requestID
to poll the job status.
1{2 "requestId": 533673}
Poll the data request job to get its status.
1curl --location --request GET 'https://amplitude.com/api/2/dsar/requests/requestID' \2--header 'Accept: application/json' \3--header 'Authorization: Basic org-api-key}:{org-secret_key}' #credentials must be base64 encoded
1GET /api/2/dsar/requests/requestID HTTP/1.12Host: amplitude.com3Accept: application/json4Authorization: Basic {org-api-key}:{org-secret_key} #credentials must be base64 encoded
This example polls request
Example: Poll a specific request
53367
.
Name | Description |
---|---|
requestId |
Required. The request ID retrieved with the create data request call. |
Name | Description |
---|---|
requestId |
Integer. The ID of the request. |
userId |
String. The User Id of the user to request data for. |
amplitudeId |
Integer. The Amplitude ID of the user to request data for. |
startDate |
Date. The start date for the data request. |
endDate |
The end date for the data request. |
status |
staging: not started submitted: in progress done: job completed and download URLs populated failed: job failed, may need to retry |
failReason |
String. If the job failed, contains Information about the failure. |
urls |
Array of strings. A list of download URLs for the data. |
expires |
Data. The date that the output download links expire. |
Download a returned output file.
The download link is valid for two days. Most clients used to send API requests automatically download the data from the S3 link. If your API client doesn't automatically download the file from the link, access it manually using your org API key as the username and your org secret key as the password.
1curl --location --request GET 'https://analytics.amplitude.com/api/2/dsar/requests/:request_id/outputs/:output_id' \2-u '{org-api-key}:{org-secret_key}'
1GET /api/2/dsar/requests/request_id/outputs/:output_id HTTP/1.12Host: analytics.amplitude.com3Authorization: Basic {org-api-key}:{org-secret_key} # credentials must be base64 encoded
This example gets output with ID
Example: Get the output for a specific request ID
0
for request 53367
.
Name |
Description |
---|---|
request_id |
Required. Integer. The ID of the request. Returned with the original GET request. |
output_id |
Required. Integer. The ID of the output to download. An integer at the end of the URL returned in the status response after the job finishes. |
Thanks for your feedback!
May 16th, 2024
Need help? Contact Support
Visit Amplitude.com
Have a look at the Amplitude Blog
Learn more at Amplitude Academy
© 2024 Amplitude, Inc. All rights reserved. Amplitude is a registered trademark of Amplitude, Inc.