Skip to content

Ideas to speed up the beginning and end of tar_make() #1482

@wlandau

Description

@wlandau

Ideas:

  • meta$deduplicate_storage() is fast. database$preprocess() calls data.table::frwite() to deduplicate storage, which is slower. I hope targets can use meta$deduplicate_storage() instead. I'll have to look deeper and see if there are other reasons for this choice.

self$ensure_meta() has apparent bottlenecks in database$set_data() and database$get_row(). Not feasible to eliminate this bottleneck.

  • active$end() calls tar_assert_objects_files(self$meta$store), which only needs to be enforced for targets that just reran. Each target's output can be checked separately after the target finishes.
  • Before the pipeline actually begins, targets lists all the files in _targets/objects/ and gets all their time stamps. What if instead we could cache a directory timestamp and only check a target's timestamp if needed? A directory's modification time stamp apparently doesn't change in a way that would be useful here. But we could only include targets mentioned in the metadata instead of listing the entire directory. At the very least, this would avoid a costly list.files() call.

pattern_begin_initial(), especially pattern_insert_branches(), is silently slow when resolving dynamic branch definitions. This is slow, but it's most likely not feasible to speed up in targets because the sources of the bottlenecks are fundamental to dynamic branching.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions