Export Amplitude event data and merged user data to your Google Cloud Storage (GCS) account. Google Cloud's bucket policies let you manage and programmatically export this data into a Google Cloud bucket. From the Amplitude UI, you can set up recurring syncs as often as once per hour.

## Prerequisites

Create a GCS service account and set permissions before you configure the integration.

If you haven't already, [create a service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) for Amplitude within the Google Cloud console. This service account lets Amplitude export your data to your Google Cloud project.

After you create a service account, generate and download the service account key file and upload it to Amplitude. Export Amplitude's account key in JSON format.

Add this service account as a member to the bucket you want to export data to. Give this member the **storage admin** role so Amplitude has the permissions to export data to your bucket.

You can also create your own role. The export process requires these permissions at minimum:

- `storage.buckets.get`
- `storage.objects.get`
- `storage.objects.create`
- `storage.objects.delete`
- `storage.objects.list`

## Set up the integration

{% callout type="note" title="" %}
You need admin privileges in Amplitude, and a role that lets you enable resources in GCS.
{% /callout %}


1. In Amplitude Data, click **Catalog** and select the **Destinations** tab.
2. In the Warehouse Destination section, click **Google Cloud Storage**.
3. On the **Getting Started** tab, select the data you want to export. You can *Export events ingested today and moving forward*, *Export all merged Amplitude IDs*, or both. For events, you can also specify filtering conditions to export only events that meet certain criteria.

{% callout type="note" title="" %}
You can export these two data types to separate buckets. Complete the setup flow twice: once for each data type.
{% /callout %}

4. Review the Event table and Merge IDs table schemas and click **Next**.
5. In the *Google Cloud Credentials For Amplitude* section, upload the Service Account Key file. This file must be in JSON format.
6. After you upload the account service key, fill out the Google Cloud bucket details in the *Google Cloud Bucket Details* section.
7. Click **Next**. Amplitude attempts a test upload to check that the credentials work. If the upload succeeds, click **Finish** to complete the GCS destination configuration and activation.

Amplitude automatically sends all future events and merged users to GCS. Amplitude exports files to your GCS account on a best-effort basis. Exports typically run hourly and contain one hour of data, but may run less frequently and contain multiple hours of data.

## Run a manual export

Backfill historical data to GCS by manually exporting data.

1. Go to the Google Cloud Storage export connection page.
2. Go to the **Backfills** tab.
3. Select the date range you want.
4. Click **Start Backfill**.

If the backfill range overlaps with previously exported data, Amplitude de-duplicates overlapping data.

## Exported data format

When you configure a GCS export, you specify a bucket name and an optional folder/prefix. This prefix determines where your exported data appears inside the bucket. If you leave the prefix empty, Amplitude writes objects directly at the bucket root.

### Raw event file and data format

Amplitude exports data as a zipped archive of JSON files, partitioned by the hour with one or more files per hour. Each file contains one event JSON object per line.

#### Object key structure

Amplitude organizes exported event files under an `{appId}` directory that matches your project ID. The full object key structure is:

`{gcsPrefix}/{appId}/{filename}`

If you don't configure a prefix, the path simplifies to:

`{appId}/{filename}`

Where:

- `{gcsPrefix}` is the optional folder/prefix you configure in the Amplitude UI.
- `{appId}` is your Amplitude project ID (the same ID that appears in the filename).
- `{filename}` follows the format below.

#### Filename format

File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the `server_upload_time`):

`projectID_yyyy-MM-dd_H#partitionInteger.json.gz`

For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:

`187520_2020-01-25_17#1.json.gz`

#### Example

If your bucket is `amplitude-data`, your prefix is `events`, and your project ID is `187520`, the full GCS path for this file is:

`gs://amplitude-data/events/187520/187520_2020-01-25_17#1.json.gz`

Here is the exported data JSON object schema:

```json
{
  "server_received_time": UTC ISO-8601 timestamp,
  "app": int,
  "device_carrier": string,
  "$schema":int,
  "city": string,
  "user_id": string,
  "uuid": UUID,
  "event_time": UTC ISO-8601 timestamp,
  "platform": string,
  "os_version": string,
  "amplitude_id": long,
  "processed_time": UTC ISO-8601 timestamp,
  "version_name": string,
  "ip_address": string,
  "paying": boolean,
  "dma": string,
  "group_properties": dict,
  "user_properties": dict,
  "client_upload_time": UTC ISO-8601 timestamp,
  "$insert_id": string,
  "event_type": string,
  "library":string,
  "amplitude_attribution_ids": string,
  "device_type": string,
  "device_manufacturer": string,
  "start_version": string,
  "location_lng": float,
  "server_upload_time": UTC ISO-8601 timestamp,
  "event_id": int,
  "location_lat": float,
  "os_name": string,
  "amplitude_event_type": string,
  "device_brand": string,
  "groups": dict,
  "event_properties": dict,
  "data": dict,
  "device_id": string,
  "language": string,
  "device_model": string,
  "country": string,
  "region": string,
  "is_attribution_event": bool,
  "adid": string,
  "session_id": long,
  "device_family": string,
  "sample_rate": null,
  "idfa": string,
  "client_event_time": UTC ISO-8601 timestamp,
 }
```

### Merged Amplitude IDs file and data format

Amplitude exports data as a zipped archive of JSON files. Each file contains one merged Amplitude ID JSON object per line.

#### Object key structure

Amplitude organizes merged ID files under a `-{orgId}` directory that matches your organization ID. The full object key structure is:

`{gcsPrefix}/-{orgId}/{filename}`

If you don't configure a prefix, the path simplifies to:

`-{orgId}/{filename}`

Where:

- `{gcsPrefix}` is the optional folder/prefix you configure in the Amplitude UI.
- `-{orgId}` is your Amplitude organization ID with a leading hyphen.
- `{filename}` follows the format below.

#### Filename format

File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the `server_upload_time`):

`-{orgId}_yyyy-MM-dd_H.json.gz`

For example, for org ID `189524`, Amplitude exports data received on Jan 25, 2020, between 5 PM and 6 PM UTC to the file:

`-189524_2020-01-25_17.json.gz`

#### Example

If your bucket is `amplitude-data`, your prefix is `merged`, and your org ID is `189524`, the full GCS path for this file is:

`gs://amplitude-data/merged/-189524/-189524_2020-01-25_17.json.gz`

{% callout type="note" title="Legacy organizations" %}
Some legacy organizations may see merged ID exports with your app ID instead of org ID. In this case, the directory and filename use `{appId}` without the leading `-`. Contact Amplitude Support to confirm which format applies to your organization.
{% /callout %}

Merged ID JSON objects have the following schema:

```json
{
 "scope": int,
 "merge_time": long,
 "merge_server_time": long,
 "amplitude_id": long,
 "merged_amplitude_id": long
}
```
