Unity Catalog and Labs CLI Support#28
Conversation
2. Fixed unit tests to include table instread of paths
2. Added databricks SDK in integration tests 3. Added docs for running integration tests with profiles
2.Added readme for running demo 3.Added docs to code
2.integration tests restructring as per cli changes
… uploader 2.correcty labs.yaml typo to labs.yml 3.correced logger to databricks.labs.dltmeta
Added - dtlmeta labs cli feature - unit test coverage for uc_enabled feature in onboarding and dataflowpipeline
- Added readme instructions for using cli with example
- Added readme instructions for using cli with example - Added cli to use existing demo folder as default
-Fixed cli for non uc flows
-Fixed integration tests for non-uc flows
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #28 +/- ##
==========================================
+ Coverage 85.00% 90.28% +5.28%
==========================================
Files 7 8 +1
Lines 740 803 +63
Branches 128 149 +21
==========================================
+ Hits 629 725 +96
+ Misses 61 31 -30
+ Partials 50 47 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Added unit tests and readme
- Added change log - added version 0.0.5
- Added hugo docs for databricks labs cli option
neil90-db
left a comment
There was a problem hiding this comment.
Tested UC and CLI Supported, works great! LGTM!
| ### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal | ||
| #### pre-requisites: | ||
| - [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) | ||
| - Python 3.8.0 + |
There was a problem hiding this comment.
Is this the exaction version required or the minimum version required? It would be good to clarify this.
There was a problem hiding this comment.
its minimum thats why python3.8.0 +
I saw this as notation in other projects
| Refer to the [Getting Started](https://databrickslabs.github.io/dlt-meta/getting_started) | ||
| ### Databricks Labs DLT-META CLI lets you run onboard and deploy in interactive python terminal | ||
| #### pre-requisites: | ||
| - [Databricks CLI](https://docs.databricks.com/en/dev-tools/cli/tutorial.html) |
There was a problem hiding this comment.
Databricks CLI is still in public preview and releases are happening every month (sometime multiple times within a month). Unless we provide a guarantee that the CLI integration with dlt-meta is forward compatible with all future databricks-cli releases, we should add a specific version(s) we have tested the integration with.
There was a problem hiding this comment.
its forward looking since we have used basic databricks sdk apis
| - ``` cd dlt-meta ``` | ||
| - ``` python -m venv .venv ``` | ||
| - ```source .venv/bin/activate ``` | ||
| - ``` pip install databricks-sdk ``` |
There was a problem hiding this comment.
same comment as above with regards to the versions.
|
|
||
| ``` Provide onboarding file path (default: demo/conf/onboarding.template): | ||
| Provide onboarding files local directory (default: demo/): | ||
| Provide dbfs path (default: dbfs:/dlt-meta_cli_demo): |
There was a problem hiding this comment.
For HMS, this default makes sense. However, for UC, we should default to some default volume?
There was a problem hiding this comment.
not using volumes for uc. If you look code its default workspace location.
|
|
||
| ``` Provide onboarding file path (default: demo/conf/onboarding.template): | ||
| Provide onboarding files local directory (default: demo/): | ||
| Provide dbfs path (default: dbfs:/dlt-meta_cli_demo): |
There was a problem hiding this comment.
This command fails and the program exits if there is any databricks cli authentication issue. In my case, the default profile was being used which had expired token. I had to explicitly provide --profile <profile_name and it got past this command
There was a problem hiding this comment.
yes. either provide --profile or it will use default
| Provide dataflow spec version (default: v1): | ||
| Provide environment name (default: prod): prod | ||
| Provide import author name (default: ravi.gawai): | ||
| Provide cloud provider name |
There was a problem hiding this comment.
We should be able to infer this using Databricks SDK. By this time, we should have information about the target environment available to us.
There was a problem hiding this comment.
Did not see any API which returns cloud provider using cli or apis
| [1] azure | ||
| [2] gcp | ||
| Enter a number between 0 and 2: 0 | ||
| Do you want to update ws paths, catalog, schema details to your onboarding file? |
There was a problem hiding this comment.
It looks like the actual text is "Update workspace/dbfs paths, unity catalog name, bronze/silver schema names in onboarding file?"
There was a problem hiding this comment.
Also, do we need to pompt this option? In what situation would user not want to update the onboarding file with path, UC catalog name, etc.
There was a problem hiding this comment.
Some customers generates onboarding files and its par of repo in that case they do not need to overwrite paths
| [0] False | ||
| [1] True | ||
| ``` | ||
| - Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs |
There was a problem hiding this comment.
The databricks labs dlt-meta onboard command seems to exit without any confirmation as shown below:
Update workspace/dbfs paths, unity catalog name, bronze/silver schema names in onboarding file?
[0] False
[1] True
Enter a number between 0 and 1: 1
(.venv) ➜ dlt-meta git:(feature/dlt-meta-uc)
I didn't know whether the command completed successfully or not. I had to read this README to figure out I have to go check the workflow UI to find the onboarding job.
We should at least try to print a successful message with the expected job name. It would be great if we can also print the job URL.
There was a problem hiding this comment.
I had prints but @nfx suggested to remove since labs cli will print it as part of logs.
| ``` | ||
| - Goto your databricks workspace and located onboarding job under: Workflow->Jobs runs | ||
| - Once onboarding jobs is finished deploy `bronze` and `silver` DLT using below command | ||
| - ```databricks labs dlt-meta deploy``` |
There was a problem hiding this comment.
This command also silently exits. Would be great if we can print some confirmation message or even better print the pipeline URL.
| ``` | ||
|
|
||
| - Silver DLT | ||
| - - ```databricks labs dlt-meta deploy``` |
There was a problem hiding this comment.
Same as above. Would be great if we can print some confirmation message or even better print the pipeline URL.
ganeshchand
left a comment
There was a problem hiding this comment.
Ran the onboard and deploy workflow. LGTM.
Unity Catalog Support
Databricks Labs CLI Support
Added two commands for DLT-META