You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/dlt-ecosystem/destinations/databricks.md
+77-2Lines changed: 77 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,80 @@ keywords: [Databricks, destination, data warehouse]
15
15
pip install dlt[databricks]
16
16
```
17
17
18
-
## Setup Guide
18
+
## Set up your Databricks workspace
19
+
20
+
To use the Databricks destination, you need:
21
+
22
+
* A Databricks workspace with a Unity Catalog metastore connected
23
+
* A Gen 2 Azure storage account and container
24
+
25
+
If you already have your Databricks workspace set up, you can skip to the [Loader setup Guide](#loader-setup-guide).
26
+
27
+
### 1. Create a Databricks workspace in Azure
28
+
29
+
1. Create a Databricks workspace in Azure
30
+
31
+
In your Azure Portal search for Databricks and create a new workspace. In the "Pricing Tier" section, select "Premium" to be able to use the Unity Catalog.
32
+
33
+
2. Create an ADLS Gen 2 storage account
34
+
35
+
Search for "Storage accounts" in the Azure Portal and create a new storage account.
36
+
Make sure it's a Data Lake Storage Gen 2 account, you do this by enabling "hierarchical namespace" when creating the account. Refer to the [Azure documentation](https://learn.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account) for further info.
37
+
38
+
3. Create a container in the storage account
39
+
40
+
In the storage account, create a new container. This will be used as a datastore for your Databricks catalog.
41
+
42
+
4. Create an Access Connector for Azure Databricks
43
+
44
+
This will allow Databricks to access your storage account.
45
+
In the Azure Portal search for "Access Connector for Azure Databricks" and create a new connector.
46
+
47
+
5. Grant access to your storage container
48
+
49
+
Navigate to the storage container you created before and select "Access control (IAM)" in the left-hand menu.
50
+
51
+
Add a new role assignment and select "Storage Blob Data Contributor" as the role. Under "Members" select "Managed Identity" and add the Databricks Access Connector you created in the previous step.
52
+
53
+
### 2. Set up a metastore and Unity Catalog and get your access token
54
+
55
+
1. Now go to your Databricks workspace
56
+
57
+
To get there from the Azure Portal, search for "Databricks" and select your Databricks and click "Launch Workspace".
58
+
59
+
2. In the top right corner, click on your email address and go to "Manage Account"
60
+
61
+
3. Go to "Data" and click on "Create Metastore"
62
+
63
+
Name your metastore and select a region.
64
+
If you'd like to set up a storage container for the whole metastore you can add your ADLS URL and Access Connector Id here. You can also do this on a granular level when creating the catalog.
65
+
66
+
In the next step assign your metastore to your workspace.
67
+
68
+
4. Go back to your workspace and click on "Catalog" in the left-hand menu
69
+
70
+
5. Click "+ Add" and select "Add Storage Credential"
71
+
72
+
Create a name and paste in the resource ID of the Databricks Access Connector from the Azure portal.
73
+
It will look something like this: `/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector_name>`
74
+
75
+
76
+
6. Click "+ Add" again and select "Add external location"
77
+
78
+
Set the URL of our storage container. This should be in the form: `abfss://<container_name>@<storage_account_name>.dfs.core.windows.net/<path>`
79
+
80
+
Once created you can test the connection to make sure the container is accessible from databricks.
81
+
82
+
7. Now you can create a catalog
83
+
84
+
Go to "Catalog" and click "Create Catalog". Name your catalog and select the storage location you created in the previous step.
85
+
86
+
8. Create your access token
87
+
88
+
Click your email in the top right corner and go to "User Settings". Go to "Developer" -> "Access Tokens".
89
+
Generate a new token and save it. You will use it in your `dlt` configuration.
90
+
91
+
## Loader setup Guide
19
92
20
93
**1. Initialize a project with a pipeline that loads to Databricks by running**
21
94
```
@@ -32,7 +105,9 @@ This will install dlt with **databricks** extra which contains Databricks Python
32
105
33
106
This should have your connection parameters and your personal access token.
34
107
35
-
It should now look like:
108
+
You will find your server hostname and HTTP path in the Databricks workspace dashboard. Go to "SQL Warehouses", select your warehouse (default is called "Starter Warehouse") and go to "Connection details".
0 commit comments