On this page

Databricks

Early Access

This feature is in Early Access. During this time, aspects of the functionality may still be developed, and this documentation may not always be up to date. If you have any questions, contact Amplitude Support.

Load your Amplitude raw event data into your Databricks workspace through a Unity Catalog Delta table. The connection authenticates with OAuth Workload Identity Federation (WIF), so Amplitude never holds a long-lived client secret for your workspace.

Considerations

  • Amplitude delivers exports on a best-effort basis. Expect data to land in your target Delta table within about 20 minutes of Amplitude receiving the events, though timing can vary with load and volume.
  • You can create multiple Databricks exports from the same Amplitude project. Each export has its own event filter and destination table, so a single project can fan out into purpose-built tables.

Prerequisites

You need admin or manager privileges in Amplitude, and a role in Databricks that can manage service principals, create Unity Catalog grants, and create an account-level Federation Policy.

Set up the following in Databricks before you open the wizard:

  • A Databricks workspace with a Unity Catalog catalog and schema. Capture the workspace host (for example, https://dbc-xxxx.cloud.databricks.com).
  • A SQL warehouse. Capture the warehouse ID from the Databricks UI under SQL Warehouses → your warehouse → View JSON or the URL.
  • A service principal in SettingsIdentity and accessService principals. Don't generate an OAuth secret. Capture both:
    • The application_id (a UUID). The wizard calls this the clientId, and it's the credential Amplitude uses at run time.
    • The numeric service principal ID. The wizard uses this to fill in the Federation Policy CLI command.

Why Workload Identity Federation?

WIF removes the shared client secret that OAuth machine-to-machine (M2M) authentication requires. It's the more secure default for this integration:

  • No shared secret. Amplitude never receives a credential that authenticates as your service principal outside of a live export run.
  • Nothing for you to rotate. The credential is a short-lived JWT that Amplitude's infrastructure mints and refreshes automatically. There's no password or key on your side that drifts, expires, or leaks through a backup.
  • One-line revocation. Deleting the Federation Policy on your service principal ends Amplitude's ability to authenticate immediately, without disturbing other integrations on the same workspace.

The wizard generates the Federation Policy command and the Unity Catalog grants SQL for you. You run them in Databricks during setup.

Set up the integration

The setup wizard has two steps: Get started and Set up. The wizard generates the Databricks CLI command and grant SQL as you type, so you can run them in Databricks without leaving the page.

  1. In Amplitude Data, click Catalog and select the Destinations tab.

  2. In the Warehouse Destinations section, click Databricks.

  3. On the Get started step, select Export events ingested today and moving forward. Set how often Amplitude exports data with the frequency picker (the default is every hour). To export only events that meet certain criteria, add a filter. Click Next.

  4. On the Set up step, enter your Credential (OAuth WIF) details:

    • Workspace host: the full workspace URL including the scheme, for example https://dbc-xxxx.cloud.databricks.com.
    • Service Principal ID (numeric): the numeric ID from the Databricks UI under SettingsIdentity and accessService principals → your service principal. The wizard uses this only to fill in the Federation Policy command. Amplitude doesn't store it.
    • Service Principal application_id (clientId): the UUID from the same Databricks page. This is the credential Amplitude uses at run time.

    Copy the Federation Policy command on the right and run it with the Databricks CLI authenticated against your Databricks account, not a workspace. The command tells Databricks to accept JWTs from Amplitude's environment as proof of identity for the service principal. The wizard sets the OIDC issuer in the command automatically for the Amplitude environment your project runs in.

  5. In the same step, enter your Unity Catalog target:

    • Catalog, Schema, Table name: the three-part Unity Catalog identifier where Amplitude writes events. Amplitude creates the table on the first run if it doesn't exist.
    • SQL warehouse ID: the warehouse Amplitude uses to run the COPY INTO statement.

    Copy the Unity Catalog grants SQL on the right and run it in a Databricks SQL editor. Then in the Databricks UI, open SQL Warehouses → your warehouse → Permissions and grant the service principal CAN USE. The warehouse permission isn't grantable through SQL.

  6. Click Finish.

Amplitude creates the credential and the export schedule. The first run executes at the next scheduled tick on the cadence you picked, verifying the credential end to end.

Revoke Amplitude's access

To stop Amplitude from authenticating as the service principal, delete the Federation Policy you created in step 4:

bash
databricks account service-principal-federation-policy list <sp-application-id> \
  --profile <your-databricks-account-profile>
databricks account service-principal-federation-policy delete <sp-application-id> <policy-id> \
  --profile <your-databricks-account-profile>

Use the same Databricks account profile you used in the Federation Policy command during setup. A workspace-scoped profile returns a 401 for these subcommands.

After you delete the policy, the next scheduled run fails with PERMISSION_DENIED and Databricks stops issuing workspace tokens for the service principal. To also block data access for any token that's still valid, remove the Unity Catalog grants from step 5.

Databricks export format

Data location

Amplitude writes events to the Unity Catalog table you named in the setup wizard. The full identifier is {catalog}.{schema}.{table_name}, using the values you entered.

Route different event slices to different tables

Amplitude's Databricks destination lets you create multiple exports from the same project. Each destination can have its own event filter and its own destination table, so you can fan a project's event stream out into purpose-built tables. For example, set up one destination that filters to checkout events and lands them in analytics.finance.checkout_events, and a second that filters to engagement events and lands them in analytics.product.engagement_events.

Event table schema

The event table uses the following Delta columns:

What's coming next

Coming soon

Merged Amplitude IDs export

A second table that tracks Amplitude's user-merge ledger, so you can correctly join events back to a user's canonical Amplitude ID.

Data Configuration export

Snapshots of your event transformations and custom event definitions, so warehouse-side queries stay aligned with the semantics you build in Amplitude.

Historical backfill

Export your existing Amplitude event history into Databricks in a single job, after your recurring export is set up.

Was this helpful?