# Google Cloud Storage

Export Amplitude event data and merged user data to your Google Cloud Storage (GCS) account. Google Cloud's bucket policies let you manage and programmatically export this data into a Google Cloud bucket. From the Amplitude UI, you can set up recurring syncs as often as once per hour.

## Prerequisites

Create a GCS service account and set permissions before you configure the integration.

If you haven't already, [create a service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) for Amplitude within the Google Cloud console. This service account lets Amplitude export your data to your Google Cloud project.

After you create a service account, generate and download the service account key file and upload it to Amplitude. Export Amplitude's account key in JSON format.

Add this service account as a member to the bucket you want to export data to. Give this member the **storage admin** role so Amplitude has the permissions to export data to your bucket.

You can also create your own role. The export process requires these permissions at minimum:

- `storage.buckets.get`
- `storage.objects.get`
- `storage.objects.create`
- `storage.objects.delete`
- `storage.objects.list`

## Set up the integration

{% callout type="note" title="" %}
You need admin privileges in Amplitude, and a role that lets you enable resources in GCS.
{% /callout %}


1. In Amplitude Data, click **Catalog** and select the **Destinations** tab.
2. In the Warehouse Destination section, click **Google Cloud Storage**.
3. On the **Getting Started** tab, select the data you want to export. You can *Export events ingested today and moving forward*, *Export all merged Amplitude IDs*, or both. For events, you can also specify filtering conditions to export only events that meet certain criteria.

{% callout type="note" title="" %}
You can export these two data types to separate buckets. Complete the setup flow twice: once for each data type.
{% /callout %}

4. Review the Event table and Merge IDs table schemas and click **Next**.
5. In the *Google Cloud Credentials For Amplitude* section, upload the Service Account Key file. This file must be in JSON format.
6. After you upload the account service key, fill out the Google Cloud bucket details in the *Google Cloud Bucket Details* section.
7. Click **Next**. Amplitude attempts a test upload to check that the credentials work. If the upload succeeds, click **Finish** to complete the GCS destination configuration and activation.

Amplitude automatically sends all future events and merged users to GCS. Amplitude exports files to your GCS account on a best-effort basis. Exports typically run hourly and contain one hour of data, but may run less frequently and contain multiple hours of data.

## Run a manual export

Backfill historical data to GCS by manually exporting data.

1. Go to the Google Cloud Storage export connection page.
2. Go to the **Backfills** tab.
3. Select the date range you want.
4. Click **Start Backfill**.

If the backfill range overlaps with previously exported data, Amplitude de-duplicates overlapping data.

## Exported data format

When you configure a GCS export, you specify a bucket name and an optional folder/prefix. This prefix determines where your exported data appears inside the bucket. If you leave the prefix empty, Amplitude writes objects directly at the bucket root.

### Raw event file and data format

Amplitude exports data as a zipped archive of JSON files, partitioned by the hour with one or more files per hour. Each file contains one event JSON object per line.

#### Object key structure

Amplitude organizes exported event files under an `{appId}` directory that matches your project ID. The full object key structure is:

`{gcsPrefix}/{appId}/{filename}`

If you don't configure a prefix, the path simplifies to:

`{appId}/{filename}`

Where:

- `{gcsPrefix}` is the optional folder/prefix you configure in the Amplitude UI.
- `{appId}` is your Amplitude project ID (the same ID that appears in the filename).
- `{filename}` follows the format below.

#### Filename format

File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the `server_upload_time`):

`projectID_yyyy-MM-dd_H#partitionInteger.json.gz`

For example, the first partition of data uploaded to this project, on Jan 25, 2020, between 5 PM and 6 PM UTC, is in the file:

`187520_2020-01-25_17#1.json.gz`

#### Example

If your bucket is `amplitude-data`, your prefix is `events`, and your project ID is `187520`, the full GCS path for this file is:

`gs://amplitude-data/events/187520/187520_2020-01-25_17#1.json.gz`

Here is the exported data JSON object schema:

```json
{
  "server_received_time": UTC ISO-8601 timestamp,
  "app": int,
  "device_carrier": string,
  "$schema":int,
  "city": string,
  "user_id": string,
  "uuid": UUID,
  "event_time": UTC ISO-8601 timestamp,
  "platform": string,
  "os_version": string,
  "amplitude_id": long,
  "processed_time": UTC ISO-8601 timestamp,
  "version_name": string,
  "ip_address": string,
  "paying": boolean,
  "dma": string,
  "group_properties": dict,
  "user_properties": dict,
  "client_upload_time": UTC ISO-8601 timestamp,
  "$insert_id": string,
  "event_type": string,
  "library":string,
  "amplitude_attribution_ids": string,
  "device_type": string,
  "device_manufacturer": string,
  "start_version": string,
  "location_lng": float,
  "server_upload_time": UTC ISO-8601 timestamp,
  "event_id": int,
  "location_lat": float,
  "os_name": string,
  "amplitude_event_type": string,
  "device_brand": string,
  "groups": dict,
  "event_properties": dict,
  "data": dict,
  "device_id": string,
  "language": string,
  "device_model": string,
  "country": string,
  "region": string,
  "is_attribution_event": bool,
  "adid": string,
  "session_id": long,
  "device_family": string,
  "sample_rate": null,
  "idfa": string,
  "client_event_time": UTC ISO-8601 timestamp,
 }
```

### Merged Amplitude IDs file and data format

Amplitude exports data as a zipped archive of JSON files. Each file contains one merged Amplitude ID JSON object per line.

#### Object key structure

Amplitude organizes merged ID files under a `-{orgId}` directory that matches your organization ID. The full object key structure is:

`{gcsPrefix}/-{orgId}/{filename}`

If you don't configure a prefix, the path simplifies to:

`-{orgId}/{filename}`

Where:

- `{gcsPrefix}` is the optional folder/prefix you configure in the Amplitude UI.
- `-{orgId}` is your Amplitude organization ID with a leading hyphen.
- `{filename}` follows the format below.

#### Filename format

File names have the following syntax, where the time represents when Amplitude servers received the data in UTC (the `server_upload_time`):

`-{orgId}_yyyy-MM-dd_H.json.gz`

For example, for org ID `189524`, Amplitude exports data received on Jan 25, 2020, between 5 PM and 6 PM UTC to the file:

`-189524_2020-01-25_17.json.gz`

#### Example

If your bucket is `amplitude-data`, your prefix is `merged`, and your org ID is `189524`, the full GCS path for this file is:

`gs://amplitude-data/merged/-189524/-189524_2020-01-25_17.json.gz`

{% callout type="note" title="Legacy organizations" %}
Some legacy organizations may see merged ID exports with your app ID instead of org ID. In this case, the directory and filename use `{appId}` without the leading `-`. Contact Amplitude Support to confirm which format applies to your organization.
{% /callout %}

Merged ID JSON objects have the following schema:

```json
{
 "scope": int,
 "merge_time": long,
 "merge_server_time": long,
 "amplitude_id": long,
 "merged_amplitude_id": long
}
```