-
Notifications
You must be signed in to change notification settings - Fork 76
Closed
Labels
Description
Prework
- I understand and agree to help guide.
- I understand and agree to contributing guide.
- New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
Proposal
Hi!
I was super excited to use the workspace_on_error="trim" parameter, as I think it makes a lot of sense. However, I have experienced that the pipeline simply hangs after encountering an error when this is enabled. The fact that it does not appropriately quit (even if there are no more targets to reasonably run), makes it quite hard to debug, as there doesn't seem to be any error messages logged in the usual places.
It seems to only be an issue when using the crew worker-framework - and both with and without running via the Slurm-plugin.
Here is my setup, using a controller-group:
low_req_ctrlr <- crew::crew_controller_local(name = "low_req_ctrlr_name", workers = 100, seconds_timeout = 600)
mid_req_ctrlr <- crew::crew_controller_local(name = "mid_req_ctrlr_name", workers = config_n_workers, seconds_timeout =600)
high_req_ctrlr <- crew::crew_controller_local(name = "high_req_ctrlr_name", workers = 2, seconds_timeout = 600)
targets::tar_option_set(
storage = "worker",retrieval = "worker", deployment="worker",
controller = crew::crew_controller_group(low_req_ctrlr, mid_req_ctrlr, high_req_ctrlr),
resources = targets::tar_resources(crew = targets::tar_resources_crew(controller = "mid_req_ctrlr_name")))