Amazon S3 (Cohort)
Set up this integration to send cohorts to an Amazon S3 bucket. This export groups of users out of Amplitude so you can synchronize them with other databases or stored procedures built off your Amazon S3 bucket. You can then use Amplitude cohorts in internal analytics dashboards and personalization engines.
Amplitude updated its security model to better align with AWS best practices. New instances of this destination default to IAM role-based access to improve security and prevent over-permissioning. Existing S3 destinations remain unchanged to avoid disruption.
To update your existing destinations to IAM roles, delete the existing connection and reconfigure the cohort sync destination to use the updated security model.
Prerequisites
- From your Amazon S3 console, find the S3 bucket you want Amplitude to sync with. Copy its name, path, and region.
Set up the integration
Amplitude setup
- In Amplitude Data, click Catalog and select the Destinations tab.
- In the Cohort section, click Amazon S3 (Cohorts). Don't select Amazon S3 for this integration.
- Enter the bucket name, select a region, enter a bucket path (optional), and enter a name for the destination. Amplitude uses the name when syncing a cohort.
- Select the Amplitude user property to match users between Amazon S3 and Amplitude.
- Click Copy Bucket Policy.
Amazon S3 setup
- In the Amazon S3 console, go to the S3 bucket and navigate to Permissions → Bucket Policy. Paste the Amplitude bucket policy into the Amazon S3 console.
- Optionally, set the following two parameters for your buckets:
- Require suffix: When set, allows users to append a string at the end of every file exported to S3.
- User property: Select a single user property to sync along with each user as an extra column in each file exported.
Send a cohort
After you connect the S3 bucket to Amplitude, you can sync any cohort to that bucket:
- From the Cohorts page in Amplitude, click the cohort to send, or create a cohort.
- Click Sync.
- Select Amazon S3, then click Next.
- Select the S3 location. This is what you named the bucket when setting up the integration.
- (Optional). Set the following two optional parameters:
- User Property: Append a user property to each user exported in this cohort. The user property appears as a column in the exported CSV file.
- Routing Key: Enter a string to append to the end of the cohort file name in S3.
- Choose a sync cadence.
- When finished, click Sync.
Cohorts in S3
Amplitude syncs your cohort as a CSV to the bucket you specified. Within the folder, there is a list of CSV files.
Each sync generates three CSV files:
- One with users who entered the cohort since the last sync.
- One with users who exited the cohort since the last sync.
- One containing the users that existed in the cohort at the time of the last sync. This way, you always have a complete historical log of S3 cohort membership.
The CSV files all use this naming convention:
path/projectID_cohortID_YYYY-MM-DDTHH:SS_difftype_routingkey.csv
Where:
path: The optional folder prefix on the path where the file should be written.projectID: Identifies which Amplitude project the cohort belongs to.cohortID: The unique identifier for your cohort. You can find this number in the URL of your cohort in Amplitude.YYYY-MM-DDTHH-SS: The timestamp when the cohort was synced.difftype: Describes which of the three user groups the CSV file contains. Acceptable values areentering,exiting, orexisting.routingkey: The optional string suffix entered before.
The timestamp in the CSV name refers to the day/time the cohort was synced. If you have an hourly or daily scheduled sync, Amplitude creates a new file for every sync with the full list of users who qualify in that cohort at that time. You can keep a historic log of audience membership.
Each CSV file contains a list of users, with data broken into the following columns:
amplitudeID: The internal Amplitude identifier for the user.userID: Your unique database identifier for the user.userProperty: The value for a user property you added in step 3 of the send a cohort section; there is one column for each user property. In portfolio projects, there is a separate column for each source app.
Was this helpful?