Load your Amplitude raw event data into your Databricks workspace through a Unity Catalog Delta table. The connection authenticates with OAuth Workload Identity Federation (WIF), so Amplitude never holds a long-lived client secret for your workspace.

## Considerations

- Amplitude delivers exports on a best-effort basis. Expect data to land in your target Delta table within about 20 minutes of Amplitude receiving the events, though timing can vary with load and volume.
- You can create multiple Databricks exports from the same Amplitude project. Each export has its own event filter and destination table, so a single project can fan out into purpose-built tables.

## Prerequisites

You need admin or manager privileges in Amplitude, and a role in Databricks that can manage service principals, create Unity Catalog grants, and create an account-level Federation Policy.

Set up the following in Databricks before you open the wizard:

- A Databricks workspace with a Unity Catalog catalog and schema. Capture the workspace host (for example, `https://dbc-xxxx.cloud.databricks.com`).
- A SQL warehouse. Capture the warehouse ID from the Databricks UI under **SQL Warehouses** → your warehouse → **View JSON** or the URL.
- A service principal in **Settings** → **Identity and access** → **Service principals**. Don't generate an OAuth secret. Capture both:
    - The **application_id** (a UUID). The wizard calls this the `clientId`, and it's the credential Amplitude uses at run time.
    - The **numeric service principal ID**. The wizard uses this to fill in the Federation Policy CLI command.

{% callout type="tip" heading="Why Workload Identity Federation?" %}
WIF removes the shared client secret that OAuth machine-to-machine (M2M) authentication requires. It's the more secure default for this integration:

- **No shared secret.** Amplitude never receives a credential that authenticates as your service principal outside of a live export run.
- **Nothing for you to rotate.** The credential is a short-lived JWT that Amplitude's infrastructure mints and refreshes automatically. There's no password or key on your side that drifts, expires, or leaks through a backup.
- **One-line revocation.** Deleting the Federation Policy on your service principal ends Amplitude's ability to authenticate immediately, without disturbing other integrations on the same workspace.
{% /callout %}

The wizard generates the Federation Policy command and the Unity Catalog grants SQL for you. You run them in Databricks during setup.

## Set up the integration

The setup wizard has two steps: **Get started** and **Set up**. The wizard generates the Databricks CLI command and grant SQL as you type, so you can run them in Databricks without leaving the page.

1. In Amplitude Data, click **Catalog** and select the **Destinations** tab.
2. In the Warehouse Destinations section, click **Databricks**.
3. On the **Get started** step, select **Export events ingested today and moving forward**. Set how often Amplitude exports data with the frequency picker (the default is every hour). To export only events that meet certain criteria, add a filter. Click **Next**.
4. On the **Set up** step, enter your **Credential (OAuth WIF)** details:
    - **Workspace host**: the full workspace URL including the scheme, for example `https://dbc-xxxx.cloud.databricks.com`.
    - **Service Principal ID (numeric)**: the numeric ID from the Databricks UI under **Settings** → **Identity and access** → **Service principals** → your service principal. The wizard uses this only to fill in the Federation Policy command. Amplitude doesn't store it.
    - **Service Principal application_id (clientId)**: the UUID from the same Databricks page. This is the credential Amplitude uses at run time.

    Copy the **Federation Policy command** on the right and run it with the Databricks CLI authenticated against your Databricks **account**, not a workspace. The command tells Databricks to accept JWTs from Amplitude's environment as proof of identity for the service principal. The wizard sets the OIDC issuer in the command automatically for the Amplitude environment your project runs in.

5. In the same step, enter your **Unity Catalog target**:
    - **Catalog**, **Schema**, **Table name**: the three-part Unity Catalog identifier where Amplitude writes events. Amplitude creates the table on the first run if it doesn't exist.
    - **SQL warehouse ID**: the warehouse Amplitude uses to run the `COPY INTO` statement.

    Copy the **Unity Catalog grants SQL** on the right and run it in a Databricks SQL editor. Then in the Databricks UI, open **SQL Warehouses** → your warehouse → **Permissions** and grant the service principal **CAN USE**. The warehouse permission isn't grantable through SQL.

6. Click **Finish**.

Amplitude creates the credential and the export schedule. The first run executes at the next scheduled tick on the cadence you picked, verifying the credential end to end.

### Revoke Amplitude's access

To stop Amplitude from authenticating as the service principal, delete the Federation Policy you created in step 4:

```bash
databricks account service-principal-federation-policy list <sp-application-id> \
  --profile <your-databricks-account-profile>
databricks account service-principal-federation-policy delete <sp-application-id> <policy-id> \
  --profile <your-databricks-account-profile>
```

Use the same Databricks account profile you used in the **Federation Policy command** during setup. A workspace-scoped profile returns a 401 for these subcommands.

After you delete the policy, the next scheduled run fails with `PERMISSION_DENIED` and Databricks stops issuing workspace tokens for the service principal. To also block data access for any token that's still valid, remove the Unity Catalog grants from step 5.

## Databricks export format

### Data location

Amplitude writes events to the Unity Catalog table you named in the setup wizard. The full identifier is `{catalog}.{schema}.{table_name}`, using the values you entered.

{% callout type="tip" heading="Route different event slices to different tables" %}
Amplitude's Databricks destination lets you create multiple exports from the same project. Each destination can have its own event filter and its own destination table, so you can fan a project's event stream out into purpose-built tables. For example, set up one destination that filters to checkout events and lands them in `analytics.finance.checkout_events`, and a second that filters to engagement events and lands them in `analytics.product.engagement_events`.
{% /callout %}

### Event table schema

The event table uses the following Delta columns:

| Column | Type | Description |
|---|---|---|
| `amplitude_attribution_ids` | VARIANT | Hashed attribution IDs on the event. |
| `amplitude_id` | BIGINT | The original Amplitude ID for the user. Use this field to automatically handle merged users. Example: 2234540891. |
| `app` | BIGINT | Project ID found in your project's Settings page. Example: 123456. |
| `city` | STRING | City. Example: "San Francisco". |
| `client_event_time` | TIMESTAMP | Local timestamp (UTC) of when the device logged the event. Example: `2015-08-10T12:00:00.000000`. |
| `client_upload_time` | TIMESTAMP | The local timestamp (UTC) of when the device uploaded the event. Example: `2015-08-10T12:00:00.000000`. |
| `country` | STRING | Country. Example: "United States". |
| `data` | VARIANT | Dictionary where Amplitude stores certain fields such as `first_event` and `merged_amplitude_id`. |
| `device_carrier` | STRING | Device carrier. Example: Verizon. |
| `device_family` | STRING | Device family. Example: Apple iPhone. |
| `device_id` | STRING | The device-specific identifier. Example: C8F9E604-F01A-4BD9-95C6-8E5357DF265D. |
| `device_type` | STRING | Device type. Example: Apple iPhone 5s. |
| `dma` | STRING | Designated marketing area (DMA). Example: San Francisco-Oakland-San Jose, CA. |
| `event_id` | BIGINT | A counter that distinguishes events. Example: 1. |
| `event_properties` | VARIANT | Event properties as semi-structured JSON. |
| `event_time` | TIMESTAMP | Amplitude timestamp (UTC), which is `client_event_time` adjusted by the difference between `server_received_time` and `client_upload_time`: `event_time` = `client_event_time` + (`server_received_time` - `client_upload_time`). Amplitude uses this timestamp to organize events on Amplitude charts. If the difference between `server_received_time` and `client_upload_time` is less than 60 seconds, Amplitude doesn't adjust `event_time` and it equals `client_event_time`. Example: `2015-08-10T12:00:00.000000`. |
| `event_type` | STRING | Event type. |
| `group_properties` | VARIANT | Group properties as semi-structured JSON. |
| `groups` | VARIANT | Group types. For more information, refer to [Account-level reporting](/docs/analytics/account-level-reporting). |
| `ip_address` | STRING | IP address. Example: "123.11.111.11". |
| `language` | STRING | Language reported by the device. |
| `library` | STRING | The Amplitude SDK that sent the event. |
| `location_lat` | DOUBLE | Latitude. Example: 12.3456789. |
| `location_lng` | DOUBLE | Longitude. Example: -123.4567890. |
| `os_name` | STRING | OS name. Example: ios. |
| `os_version` | STRING | OS version. |
| `paying` | STRING | True if the user has ever logged any revenue, otherwise none. You can modify the property value with the Identify API. Example: true. |
| `platform` | STRING | Platform that sent the event. |
| `processed_time` | TIMESTAMP | Time Amplitude processed the event. |
| `region` | STRING | Region. Example: California. |
| `sample_rate` | BIGINT | Event sampling rate. |
| `server_received_time` | TIMESTAMP | Server time when Amplitude received the event. |
| `server_upload_time` | TIMESTAMP | Amplitude timestamp (UTC) of when Amplitude servers received the event. Example: `2015-08-10T12:00:00.000000`. |
| `session_id` | BIGINT | The session start time in milliseconds since epoch. Example: 1396381378123. |
| `start_version` | STRING | App version when Amplitude first tracked the user. Example: 1.0.0. |
| `user_id` | STRING | A readable ID you specify. Use something that doesn't change. Avoid using the user's email address. |
| `user_properties` | VARIANT | User properties as semi-structured JSON. |

## What's coming next

{% outcomes heading="Coming soon" %}
{% outcome icon="Users" title="Merged Amplitude IDs export" %}
A second table that tracks Amplitude's user-merge ledger, so you can correctly join events back to a user's canonical Amplitude ID.
{% /outcome %}
{% outcome icon="Workflow" title="Data Configuration export" %}
Snapshots of your event transformations and custom event definitions, so warehouse-side queries stay aligned with the semantics you build in Amplitude.
{% /outcome %}
{% outcome icon="Database" title="Historical backfill" %}
Export your existing Amplitude event history into Databricks in a single job, after your recurring export is set up.
{% /outcome %}
{% /outcomes %}
