Skip to content

Releases: ropensci/targets

Initialization speed and pre-processing progress messages

08 May 16:56
Compare
Choose a tag to compare

targets 1.11.3

Bug fixes

Other changes

  • Call suppressPackageStartupMessages() once for the whole pipeline. Repeated target-specific calls may be slow, and the messages themselves are cumbersome. This is an appropriate tradeoff.
  • Ensure the progress bar from the balanced reporter does not chop up messages from tar_debug_instructions().
  • Remove ANSI escape sequences from warnings and error messages.
  • Use cli::cli_text() instead of cli::cli_progress_output() (#1478, @dipterix).
  • Minor speedups in the beginning and end of tar_make() (#1482).
  • Cache _targets/objects/ time stamps only for local builders mentioned in the metadata, as opposed to everything in that directory (#1482).
  • Instrument pre-processing overhead with progress bars (#1482).

Bug fixes

11 Apr 17:56
a8ebfa3
Compare
Choose a tag to compare

targets 1.11.2

  • Documentation fix: if format is "file" and repository is not "local", then the local file is no longer deleted after upload (#1467).
  • Improve legend labels in graphs.
  • Repair mermaid.js graphs with disconnected edges (#1472).

Terse reporter and bugfix

10 Apr 20:31
b761a4d
Compare
Choose a tag to compare

targets 1.11.1

  • Bugfix: rstudio_available() returns FALSE without error if rstudioapi is not installed.
  • Add a new "terse" reporter, which is the "balanced" reporter without the progress bar. Make "terse" the default reporter

Improved speed, default settings, and aesthetics

10 Apr 17:09
8beb9a0
Compare
Choose a tag to compare

targets 1.11.0

Deprecated features

  • Deprecate the priority argument of tar_target(). Because of #1458, custom priorities no longer have an effect on execution order. However, up-to-date parallelized pipelines with 100000+ targets can now be checked around 10 times faster, so the tradeoff is worth it.

Changes to default behavior

  • Keep format = "file" files on disk even for non-local repositories (#1467).

Changes to default settings

  • In tar_option_get(), set repository_meta to "local" by default, regardless of repository (#1427).
  • In tar_option_get(), set storage = "worker", retrieval = "auto", and memory = "auto" by default (#1426). For memory, "auto" is now equivalent to "transient" most of the time, but it is equivalent to "persistent" for non-dynamic targets that other targets dynamically branch over. For retrieval, the "auto" setting is new. It is equivalent to "worker" for most cases, but it aligns with "main" for dynamic branches that branch over non-dynamic targets. All this is to avoid re-reading the upstream target from disk every time a branch needs to run.
  • Set the new "balanced" reporter to be the default reporter for tar_make() and tar_outdated().
  • Set the default garbage_collection argument of tar_option_get() to 0 (#1464).

Efficiency improvements

  • Speed up checking up-to-date targets in large dynamic branching pipelines (#1458, #1460). The speedup is over 10-fold or more in some cases.
  • Maintain a persistent text connection when appending to a metadata text file (#1415).
  • Avoid superfluous garbage collection when crew controllers are saturated.
  • Set defaults for storage, retrieval, and memory that balance resource tradeoffs for the most common pipelines (#1426).
  • Garbage collection only runs in targets:::target_run() (#1464). There is no longer a separate gc() call on the main process.
  • Shave off overhead from store_sync_file_meta() in the general case.

Other changes

  • Upload workspaces to the cloud if tar_option_get("repository_meta") is "aws" or "gcp". Download them with tar_workspace_download() and delete them with tar_destroy(destroy = "all") or tar_destroy(destroy = "cloud").
  • Deep-copy settings when resolving format = "auto" (#1425, @paulseamer).
  • Add store_read_path.tar_auto() (#1429, @paulseamer).
  • Improve error message that explains iteration = "group" branching problems.
  • Allow more special characters in recorded warnings and error messages.
  • Call cli::style_reset() at the end of non-silent reporters (#1450, @r2evans).
  • Exclude lists of target definitions from the globals in the dependency graph (#1431).
  • Nomenclature change: drop the term "dynamic file" in favor of "file target".
  • Internally choose a default level separation in the visNetwork graph based on the number of hierarchical levels and the maximum number of vertices per level (#1432).
  • In tar_visnetwork(), choose the colors of the edges based on the origin vertices, not the destination vertices (#1433).
  • In the "verbose" and "timestamp" reporters, print "dispatched pattern" messages, and print the total computation and storage summed over all the branches.
  • Create a new "balanced" reporter with a cli progress bar (#1442).
  • Deprecate reporters "forecast", "forecast_interactive", "verbose_positives", and "timestamp_positives" (#1442).
  • Ensure colors printed to the console are preserved when forwarded from the callr process (#1442).
  • Add tar_option_with() (ropensci/tarchetypes#215, @noamross).
  • Use prettyunits to print elapsed times and file sizes.
  • Shorten and simplify the tar_make() error message.
  • Minor bugfix: add a new on_worker argument to target_run() and builder_unload_value() so the latter only removes the target value if the target was actually run on a worker.

Migrate to {crew} 1.0.0

31 Jan 16:15
fbedea3
Compare
Choose a tag to compare

targets 1.10.1

  • Restore explicit references to "self" in R6 classes.
  • Perform crew task retries.
  • Try to handle NA buckets in store_delete_objects.tar_aws() and store_delete_objects.tar_gcp().

Speed gains for large pipelines (with many up-to-date targets)

13 Jan 14:14
739276a
Compare
Choose a tag to compare

targets 1.10.0

Invalidating changes

These changes invalidate certain targets in a pipeline and cause them to rerun on the next tar_make().

  • Exclude function signatures from tar_repository_cas() output strings to reduce the size of pipeline metadata (#1390).
  • Exclude function signatures from tar_format() output strings to reduce the size of pipeline metadata (#1390).

Summary of performance gains

tar_make() and tar_outdated() run much faster in this release. Extensive profiling was done on a real-world simulation pipeline with 66002 up-to-date targets. For tar_make() using all the default settings:

Machine Before (seconds) After (seconds) Speedup
M2 Macbook 413.16 35.538 11.62587
RHEL9 450.66 94.08 4.790

And for tar_outdated() using all the default settings

Machine Before (seconds) After (seconds) Speedup
M2 Macbook 91.314 16.636 5.48894
RHEL9 167.809 37.395 4.487472

To take advantage of these speed gains for an existing pipeline, you may have to run tar_make() to convert the time stamps and file sizes to a new format. This initial tar_make() is slow, but subsequent tar_make() calls should be much faster than before the upgrade.

Other/specific changes

  • Speed up tar_make() and tar_outdated() by avoiding excessive buffering and disk writes for metadata and reporters when the pipeline is just skipping targets.
  • Use a more lookup-efficient data structure for tar_runtime$file_info (#1398).
  • Fall back on vector aggregation without names (#1401, @guglicap).
  • Speed up representation of file sizes in metadata (#1408).
  • Add a new "forecast_interactive" reporter to tar_outdated() to choose "forecast" for interactive sessions and "silent" for non-interactive ones.
  • Add a new seconds_reporter_outdated argument to tar_config_set() with a default of 1 to control the time interval of the reporter of tar_outdated() and other passive algorithm functions.
  • Remove target descriptions from the default labels of graph visualizations.

igraph compatibility

04 Dec 11:26
50b8b2e
Compare
Choose a tag to compare

targets 1.9.1

Bug fixes

  • Allow branch references to contain multi-element path vectors with cloud metadata (#1382, @n8layman).
  • Avoid partial matches in internal code (#1384, @olivroy).
  • Add error handling around calls to ps::ps_disk_partitions() and ps::ps_fs_mount_point().
  • Do not store _targets/objects/ paths in metadata for CAS repositories (#1391).

Compatibility

  • Ensure compatibility with igraph >= 2.1.2.

Memory efficiency

18 Nov 02:59
d6a696a
Compare
Choose a tag to compare

targets 1.9.0

Improvements

  • Un-break workflows that use format = "file_fast" (#1339, @koefoeden).
  • Fix deadlock in error = "trim" (#1340, @koefoeden).
  • Remove tailored debugging message (#1341, @koefoeden).
  • Store warnings while writing to storage (#1345, @Aariq).
  • Allow garbage_collection to be a non-negative integer to control the frequency of garbage collection in a performant, convenient, unified way (#1351).
  • Deprecate the garbage_collection argument of tar_make(), tar_make_future(), and tar_make_clusterm() (#1351).
  • Instrument target_run(), target_prepare(), and target_conclude() using autometric.
  • Avoid sending problematic error classes such as "vctrs_error_subscript_oob" to rlang::abort() (#1354, @Jiefei-Wang).
  • Reduce memory consumption by ~23% in large pipelines by avoiding the accumulation of promise objects (#1352).
  • Avoid store_assert_format() and store_convert_object() is storage is "none".
  • Add a list() method to tar_repository_cas() to make it easier and more efficient to specify custom CAS repositories (#1366).
  • Improve speed and reduce memory consumption by avoiding deep copies of inner environments of target definition objects (#1368).
  • Reduce memory consumption by storing buds and branches as lightweight references when memory is "transient" (#1364).
  • Replace the memory class with the new lookup class.
  • Implement memory = "auto" to select transient memory for dynamic branches and persistent memory for other targets (#1371).
  • Omit whole pattern targets from branch subpipelines when possible. Should reduce memory consumption in some cases.
  • Omit whole stem targets from branch subpipelines when retrieval is "main" and only a bud is actually used. The same cannot be done with branches because each branch may need to be (un)marshaled individually.
  • Compress branches into references when retrieval is "worker" and the whole pattern is part of the subpipeline.
  • Avoid duplicated branch aggregation: just send the branches over the network.
  • Back-compatibly switch format = "qs" from qs to qs2 (#1373).
  • Add tar_unblock_process().

Potentially invalidating changes

  • Add "keepNA" and "keepInteger" to .deparseOpts() (#1375). This may cause existing pipelines to rerun, but it makes add-ons like tarchetypes::tar_map() much easier to use.

Content addressable storage

02 Oct 17:41
3695f06
Compare
Choose a tag to compare

targets 1.8.0

  • Wrap tar_watch() UI module in bslib::page() (#1302, @kwbyron-lilly).
  • Remove callr_function in tar_make_as_job() argument list.
  • Ensure storage = "worker" is respected when the process of storing an object generates an error (#1304, @multimeric).
  • Default to the _targets.R pattern in tar_branches() (#1306, @multimeric, @mattwarkentin).
  • Remove superfluous functions and globals from metadata with tar_prune() (#1312, @benzipperer).
  • Change the default workspace_on_error option to TRUE (#1310, @hadley).
  • Enhance and organize the error = "stop" error message.
  • Avoid saving a file in _targets/objects for error = "null". Instead, switch to a special "null" storage format class if error is "null" the target throws an error. This should allow users to more freely create new formats with tar_format() without worrying about how to handle NULL objects created by error = "null".
  • Implement format = "auto" (#1311, @hadley).
  • Replace pingr dependency with base::socketConnection() for local URL utilities (#1317, #1318, @Adafede).
  • Implement tar_repository_cas(), tar_repository_cas_local(), and tar_repository_cas_local_gc() for content-addressable storage (#1232, #1314, @noamross).
  • Add tar_format_get() to make implementing CAS systems easier.
  • Implement error = "trim" in tar_target() and tar_option_set() (#1310, #1311, @hadley).
  • Use the file system type to decide whether to trust time stamps (#1315, @hadley, @gaborcsardi).
  • Deprecate format = "file_fast" in favor of the above (#1315).
  • Deprecate trust_object_timestamps in favor of the more unified trust_timestamps in tar_option_set() (#1315).
  • Print storage size of each target in verbose reporters (#1337, @psychelzh).
  • Combine help files of tar_target() and tar_target_raw(). Same with tar_load() and tar_load_raw().
  • Add a substitute argument to tar_format() to make it easier to write custom storage formats without metaprogramming.

bslib and speed

20 Jun 18:23
42cb4c1
Compare
Choose a tag to compare

targets 1.7.1

  • Use bslib in tar_watch().
  • Speed up target_upstream_edges() and pipeline_upstream_edges() by avoiding data frames until the last minute (17% speedup for certain kinds of large pipelines).
  • Automatically set as_job to FALSE in tar_make() if rstudioapi and/or RStudio is not available.