Skip to content

Commit 9e35656

Browse files
authored
Merge pull request #949 from dlt-hub/sthor/databricks-workspace-docs
Databricks workspace setup docs
2 parents d2bf8b5 + 7343fff commit 9e35656

File tree

2 files changed

+78
-2
lines changed

2 files changed

+78
-2
lines changed

docs/website/docs/dlt-ecosystem/destinations/databricks.md

Lines changed: 77 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,80 @@ keywords: [Databricks, destination, data warehouse]
1515
pip install dlt[databricks]
1616
```
1717

18-
## Setup Guide
18+
## Set up your Databricks workspace
19+
20+
To use the Databricks destination, you need:
21+
22+
* A Databricks workspace with a Unity Catalog metastore connected
23+
* A Gen 2 Azure storage account and container
24+
25+
If you already have your Databricks workspace set up, you can skip to the [Loader setup Guide](#loader-setup-guide).
26+
27+
### 1. Create a Databricks workspace in Azure
28+
29+
1. Create a Databricks workspace in Azure
30+
31+
In your Azure Portal search for Databricks and create a new workspace. In the "Pricing Tier" section, select "Premium" to be able to use the Unity Catalog.
32+
33+
2. Create an ADLS Gen 2 storage account
34+
35+
Search for "Storage accounts" in the Azure Portal and create a new storage account.
36+
Make sure it's a Data Lake Storage Gen 2 account, you do this by enabling "hierarchical namespace" when creating the account. Refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account) for further info.
37+
38+
3. Create a container in the storage account
39+
40+
In the storage account, create a new container. This will be used as a datastore for your Databricks catalog.
41+
42+
4. Create an Access Connector for Azure Databricks
43+
44+
This will allow Databricks to access your storage account.
45+
In the Azure Portal search for "Access Connector for Azure Databricks" and create a new connector.
46+
47+
5. Grant access to your storage container
48+
49+
Navigate to the storage container you created before and select "Access control (IAM)" in the left-hand menu.
50+
51+
Add a new role assignment and select "Storage Blob Data Contributor" as the role. Under "Members" select "Managed Identity" and add the Databricks Access Connector you created in the previous step.
52+
53+
### 2. Set up a metastore and Unity Catalog and get your access token
54+
55+
1. Now go to your Databricks workspace
56+
57+
To get there from the Azure Portal, search for "Databricks" and select your Databricks and click "Launch Workspace".
58+
59+
2. In the top right corner, click on your email address and go to "Manage Account"
60+
61+
3. Go to "Data" and click on "Create Metastore"
62+
63+
Name your metastore and select a region.
64+
If you'd like to set up a storage container for the whole metastore you can add your ADLS URL and Access Connector Id here. You can also do this on a granular level when creating the catalog.
65+
66+
In the next step assign your metastore to your workspace.
67+
68+
4. Go back to your workspace and click on "Catalog" in the left-hand menu
69+
70+
5. Click "+ Add" and select "Add Storage Credential"
71+
72+
Create a name and paste in the resource ID of the Databricks Access Connector from the Azure portal.
73+
It will look something like this: `/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector_name>`
74+
75+
76+
6. Click "+ Add" again and select "Add external location"
77+
78+
Set the URL of our storage container. This should be in the form: `abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>`
79+
80+
Once created you can test the connection to make sure the container is accessible from databricks.
81+
82+
7. Now you can create a catalog
83+
84+
Go to "Catalog" and click "Create Catalog". Name your catalog and select the storage location you created in the previous step.
85+
86+
8. Create your access token
87+
88+
Click your email in the top right corner and go to "User Settings". Go to "Developer" -> "Access Tokens".
89+
Generate a new token and save it. You will use it in your `dlt` configuration.
90+
91+
## Loader setup Guide
1992

2093
**1. Initialize a project with a pipeline that loads to Databricks by running**
2194
```
@@ -32,7 +105,9 @@ This will install dlt with **databricks** extra which contains Databricks Python
32105

33106
This should have your connection parameters and your personal access token.
34107

35-
It should now look like:
108+
You will find your server hostname and HTTP path in the Databricks workspace dashboard. Go to "SQL Warehouses", select your warehouse (default is called "Starter Warehouse") and go to "Connection details".
109+
110+
Example:
36111

37112
```toml
38113
[destination.databricks.credentials]

docs/website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@ const sidebars = {
9797
'dlt-ecosystem/destinations/motherduck',
9898
'dlt-ecosystem/destinations/weaviate',
9999
'dlt-ecosystem/destinations/qdrant',
100+
'dlt-ecosystem/destinations/databricks',
100101
]
101102
},
102103
],

0 commit comments

Comments
 (0)