Skip to content

Unity Catalog and Labs CLI Support#28

Merged
ravi-databricks merged 38 commits intomainfrom
feature/dlt-meta-uc
Jan 5, 2024
Merged

Unity Catalog and Labs CLI Support#28
ravi-databricks merged 38 commits intomainfrom
feature/dlt-meta-uc

Conversation

@ravi-databricks
Copy link
Contributor

Unity Catalog Support

  • Added uc_enabled flag so that existing customers wont face any issues
  • Based on uc_enabled flag Onboarding and DataflowPipeline will use paths or uc namespaces
  • Created integration tests framework which can be used to launch demos
  • Added integration tests for uc
  • Modified existing demos to incorporate unity catalog and integration test framework
  • Added unit test coverage for Dataflowpipeline
  • Modified documentation

Databricks Labs CLI Support

Added two commands for DLT-META

  • onboard: Captures all onboarding details from command line and launch onboarding job to your databricks workspace
  • deploy: Captures all DLT pipeline details from command line and launch DLT pipeline to your databricks workspace
  • Customers can use onboard and deploy as API if dlt-meta is installed using pypi

ravi-db and others added 30 commits July 20, 2023 15:08
1.uc table scheme for dlt reader writer
2.onboarding modification to use table instead of paths
2. Fixed unit tests to include table instread of paths
2. Added databricks SDK in integration tests
3. Added docs for running integration tests with profiles
1. uc_enabled flag for supporting UC + nonUC dlt
2. used isinstance instead of types
1.Dataflowpipeline silver schema read for uc_enabled
2.Fixed integration tests with databricks sdk for cloudFiles
3.Fixed demos for dais demo for uc_enabled=true and tech_summit demo for uc_enabled=False
2.Added readme for running demo
3.Added docs to code
2.integration tests restructring as per cli changes
1.Labs CLI: onboard command with uc
2.Labs CLI: deploy command with uc
1.CLI: onboard for nonuc
2.CLI: deploy for nonuc
… uploader

2.correcty labs.yaml typo to labs.yml
3.correced logger to databricks.labs.dltmeta
Added
- dtlmeta labs cli feature
- unit test coverage for uc_enabled feature in onboarding and dataflowpipeline
- Added readme instructions for using cli with example
- Added readme instructions for using cli with example
- Added cli to use existing demo folder as default
@codecov
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (f5f6c34) 85.00% compared to head (b0e2e31) 90.28%.

Files Patch % Lines
src/onboard_dataflowspec.py 94.28% 4 Missing ⚠️
src/dataflow_pipeline.py 93.33% 1 Missing and 2 partials ⚠️
src/__main__.py 83.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #28      +/-   ##
==========================================
+ Coverage   85.00%   90.28%   +5.28%     
==========================================
  Files           7        8       +1     
  Lines         740      803      +63     
  Branches      128      149      +21     
==========================================
+ Hits          629      725      +96     
+ Misses         61       31      -30     
+ Partials       50       47       -3     
Flag Coverage Δ
unittests 90.28% <93.23%> (+5.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

1.Readme doc for databricks labs cli
2.Unit tests
Copy link
Contributor

@howardwu-db howardwu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

- Added change log
- added version 0.0.5
- Added hugo docs for databricks labs cli option
Copy link

@neil90-db neil90-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested UC and CLI Supported, works great! LGTM!

### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
#### pre-requisites:
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)
- Python 3.8.0 +

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the exaction version required or the minimum version required? It would be good to clarify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its minimum thats why python3.8.0 +
I saw this as notation in other projects

Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started)
### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal
#### pre-requisites:
- [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Databricks CLI is still in public preview and releases are happening every month (sometime multiple times within a month). Unless we provide a guarantee that the CLI integration with dlt-meta is forward compatible with all future databricks-cli releases, we should add a specific version(s) we have tested the integration with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its forward looking since we have used basic databricks sdk apis

- ``` cd dlt-meta ```
- ``` python -m venv .venv ```
- ```source .venv/bin/activate ```
- ``` pip install databricks-sdk ```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above with regards to the versions.


``` Provide onboarding file path (default: demo/conf/onboarding.template):
Provide onboarding files local directory (default: demo/):
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HMS, this default makes sense. However, for UC, we should default to some default volume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not using volumes for uc. If you look code its default workspace location.


``` Provide onboarding file path (default: demo/conf/onboarding.template):
Provide onboarding files local directory (default: demo/):
Provide dbfs path (default: dbfs:/dlt-meta_cli_demo):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command fails and the program exits if there is any databricks cli authentication issue. In my case, the default profile was being used which had expired token. I had to explicitly provide --profile <profile_name and it got past this command

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. either provide --profile or it will use default

Provide dataflow spec version (default: v1):
Provide environment name (default: prod): prod
Provide import author name (default: ravi.gawai):
Provide cloud provider name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to infer this using Databricks SDK. By this time, we should have information about the target environment available to us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not see any API which returns cloud provider using cli or apis

[1] azure
[2] gcp
Enter a number between 0 and 2: 0
Do you want to update ws paths, catalog, schema details to your onboarding file?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the actual text is "Update workspace/dbfs paths, unity catalog name, bronze/silver schema names in onboarding file?"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we need to pompt this option? In what situation would user not want to update the onboarding file with path, UC catalog name, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some customers generates onboarding files and its par of repo in that case they do not need to overwrite paths

[0] False
[1] True
```
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The databricks labs dlt-meta onboard command seems to exit without any confirmation as shown below:

Update workspace/dbfs paths, unity catalog name, bronze/silver schema names in onboarding file?
[0] False
[1] True
Enter a number between 0 and 1: 1
(.venv) ➜  dlt-meta git:(feature/dlt-meta-uc) 

I didn't know whether the command completed successfully or not. I had to read this README to figure out I have to go check the workflow UI to find the onboarding job.
We should at least try to print a successful message with the expected job name. It would be great if we can also print the job URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had prints but @nfx suggested to remove since labs cli will print it as part of logs.

```
- Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs
- Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command
- ```databricks labs dlt-meta deploy```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command also silently exits. Would be great if we can print some confirmation message or even better print the pipeline URL.

```

- Silver DLT
- - ```databricks labs dlt-meta deploy```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Would be great if we can print some confirmation message or even better print the pipeline URL.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for labs.yml

Copy link

@ganeshchand ganeshchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the onboard and deploy workflow. LGTM.

@ravi-databricks ravi-databricks merged commit 2a93dd9 into main Jan 5, 2024
@ravi-databricks ravi-databricks deleted the feature/dlt-meta-uc branch April 30, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants