-
Notifications
You must be signed in to change notification settings - Fork 415
Improved pipeline attach command and Dashboard launcher extensions #3060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
| dataset_name: str = None, | ||
| sync_if_missing: bool = False, | ||
| **injection_kwargs: Any, | ||
| ) -> Pipeline: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the changes to def attach. Maybe we should have something like attach_remote with a reduced arg set that does this and leave the attach unchanged.
dlt/cli/deploy_command_helpers.py
Outdated
| if extended_info: | ||
| d_t_node = call_args.arguments.get("destination") | ||
| if d_t_node: | ||
| destination = evaluate_node_literal(d_t_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not work for any destination that is not a string literal..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we could also parse destination factory. but IMO we should not invest too much in AST right now
dlt/cli/command_wrappers.py
Outdated
| pipeline_name = pipeline_info["pipeline_name"] | ||
| pipelines_dir = pipeline_info["pipelines_dir"] | ||
|
|
||
| dlt.attach( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What can happen here is, that a user wants to open the pipeline as defined in the script, but there already is the state of some other pipeline with the same name on the local machine and that one is opened.
dlt/cli/command_wrappers.py
Outdated
|
|
||
| @utils.track_command("dashboard", True) | ||
| def dashboard_command_wrapper(pipelines_dir: Optional[str], edit: bool) -> None: | ||
| def dashboard_command_wrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively do changing this top level dashboard command, we could also allow the pipeline command to not take a pipeline name but a script file. But that is probably quite confusing.
dlt/cli/deploy_command_helpers.py
Outdated
| if d_t_node: | ||
| destination = evaluate_node_literal(d_t_node) | ||
| if destination is None: | ||
| raise CliCommandInnerException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just warn here to be backward compatible
dlt/cli/deploy_command_helpers.py
Outdated
| if extended_info: | ||
| d_t_node = call_args.arguments.get("destination") | ||
| if d_t_node: | ||
| destination = evaluate_node_literal(d_t_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we could also parse destination factory. but IMO we should not invest too much in AST right now
dlt/cli/command_wrappers.py
Outdated
|
|
||
| run_dashboard(pipelines_dir=pipelines_dir, edit=edit) | ||
| # if a pipeline script path is provided, we need to parse out pipeline info from script and sync it | ||
| pipeline_name: str = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you do here. but this code should be executed by deploy command (or not at all because deploy command has access to pipeline state and trace). I was possibly not specific when we discussed that:
- workspace dashboard and report notebook should attach in the same way
dlt.attach(pipeline_name)should be enough in both cases - it is the task of deployment script to generate additional information (form AST/state/trace) and add it to the job package. here it may just emit env variables:
PIPELINES__<pipeline_name>__DESTINATION_TYPE=...
PIPELINES__<pipeline_name>__DESTINATION_NAME=...
PIPELINES__<pipeline_name>__DATASET_NAME=...
...
attach will see it automatically even without those parameters being passed. look at the code
now the big question is how we gather this parameters. if you are against runtime information like using state or trace then we'll invest in AST parsing. but IMO it will never be as good
dlt/pipeline/__init__.py
Outdated
| destination_name=injection_kwargs.get("staging_name", None), | ||
| ) | ||
|
|
||
| pipeline_kwargs = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a cleaner way. will leave a comment separately
f1d5a8f to
46edfa0
Compare
c9ddb93 to
3e8a486
Compare
3e8a486 to
2c8ed76
Compare
rudolfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it looks pretty OK. see my comments:
- we need to test edge cases in my comments
- look where attach is used. AFAIK we use it in cli to restore pipeline:
try:
if verbosity > 0:
fmt.echo("Attaching to pipeline %s" % fmt.bold(pipeline_name))
p = dlt.attach(pipeline_name=pipeline_name, pipelines_dir=pipelines_dir)
except CannotRestorePipelineException as e:
if operation not in {"sync", "drop"}:
raise
fmt.warning(str(e))
if not fmt.confirm(
"Do you want to attempt to restore the pipeline state from destination?",
default=False,
):
return
destination = destination or fmt.text_input(
f"Enter destination name for pipeline {fmt.bold(pipeline_name)}"
)
dataset_name = dataset_name or fmt.text_input(
f"Enter dataset name for pipeline {fmt.bold(pipeline_name)}"
)
p = dlt.pipeline(
pipeline_name,
pipelines_dir,
destination=destination,
dataset_name=dataset_name,
)
p.sync_destination()
if p.first_run:
# remote state was not found
p._wipe_working_folder()
fmt.error(
f"Pipeline {pipeline_name} was not found in dataset {dataset_name} in {destination}"
)
return
if operation == "sync":
return # No need to sync againwhich looks like what you already implemented :)
| # set it as current pipeline | ||
| p.activate() | ||
| return p | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please allow for explicit dataset name in args.
dlt/pipeline/__init__.py
Outdated
| return p | ||
| except CannotRestorePipelineException: | ||
| # we can try to sync a pipeline with the given name | ||
| p = pipeline(pipeline_name, pipelines_dir, destination=destination, staging=staging) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can attempt destination sync only if destination is set. otherwise rer-aise the exception.
| except CannotRestorePipelineException: | ||
| # we can try to sync a pipeline with the given name | ||
| p = pipeline(pipeline_name, pipelines_dir, destination=destination, staging=staging) | ||
| p.sync_destination() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this can raise PipelineStepFailed if destination state is for some reason broken. this is OK to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if p.first_run is True - means that there's no remote state. in that case you should wipe the pipeline working dir that got created by dlt.pipeline and reraise original exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my review message. it looks like it is already implemented in pipeline_command
rudolfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
This PR