Releases: ropensci/targets
Releases · ropensci/targets
Initialization speed and pre-processing progress messages
targets 1.11.3
Bug fixes
- Use
qmethod = "escape"
to avoid Rdatatable/data.table#3509 (#1480, @koefoeden). - Ensure
error = "trim"
does not hang when the errored target has a long chain of reverse dependencies (#1481, @koefoeden). - Manually remove class
"rlib_error_package_not_found"
from errors (#1484, @malcolmbarrett). This and #1354 are unfortunate consequences of #997.
Other changes
- Call
suppressPackageStartupMessages()
once for the whole pipeline. Repeated target-specific calls may be slow, and the messages themselves are cumbersome. This is an appropriate tradeoff. - Ensure the progress bar from the balanced reporter does not chop up messages from
tar_debug_instructions()
. - Remove ANSI escape sequences from warnings and error messages.
- Use
cli::cli_text()
instead ofcli::cli_progress_output()
(#1478, @dipterix). - Minor speedups in the beginning and end of
tar_make()
(#1482). - Cache
_targets/objects/
time stamps only for local builders mentioned in the metadata, as opposed to everything in that directory (#1482). - Instrument pre-processing overhead with progress bars (#1482).
Bug fixes
Terse reporter and bugfix
targets 1.11.1
- Bugfix:
rstudio_available()
returnsFALSE
without error ifrstudioapi
is not installed. - Add a new
"terse"
reporter, which is the"balanced"
reporter without the progress bar. Make"terse"
the default reporter
Improved speed, default settings, and aesthetics
targets 1.11.0
Deprecated features
- Deprecate the
priority
argument oftar_target()
. Because of #1458, custom priorities no longer have an effect on execution order. However, up-to-date parallelized pipelines with 100000+ targets can now be checked around 10 times faster, so the tradeoff is worth it.
Changes to default behavior
- Keep
format = "file"
files on disk even for non-local repositories (#1467).
Changes to default settings
- In
tar_option_get()
, setrepository_meta
to"local"
by default, regardless ofrepository
(#1427). - In
tar_option_get()
, setstorage = "worker"
,retrieval = "auto"
, andmemory = "auto"
by default (#1426). Formemory
,"auto"
is now equivalent to"transient"
most of the time, but it is equivalent to"persistent"
for non-dynamic targets that other targets dynamically branch over. Forretrieval
, the"auto"
setting is new. It is equivalent to"worker"
for most cases, but it aligns with"main"
for dynamic branches that branch over non-dynamic targets. All this is to avoid re-reading the upstream target from disk every time a branch needs to run. - Set the new "balanced" reporter to be the default reporter for
tar_make()
andtar_outdated()
. - Set the default
garbage_collection
argument oftar_option_get()
to 0 (#1464).
Efficiency improvements
- Speed up checking up-to-date targets in large dynamic branching pipelines (#1458, #1460). The speedup is over 10-fold or more in some cases.
- Maintain a persistent text connection when appending to a metadata text file (#1415).
- Avoid superfluous garbage collection when
crew
controllers are saturated. - Set defaults for
storage
,retrieval
, andmemory
that balance resource tradeoffs for the most common pipelines (#1426). - Garbage collection only runs in
targets:::target_run()
(#1464). There is no longer a separategc()
call on the main process. - Shave off overhead from
store_sync_file_meta()
in the general case.
Other changes
- Upload workspaces to the cloud if
tar_option_get("repository_meta")
is"aws"
or"gcp"
. Download them withtar_workspace_download()
and delete them withtar_destroy(destroy = "all")
ortar_destroy(destroy = "cloud")
. - Deep-copy settings when resolving
format = "auto"
(#1425, @paulseamer). - Add
store_read_path.tar_auto()
(#1429, @paulseamer). - Improve error message that explains
iteration = "group"
branching problems. - Allow more special characters in recorded warnings and error messages.
- Call
cli::style_reset()
at the end of non-silent reporters (#1450, @r2evans). - Exclude lists of target definitions from the globals in the dependency graph (#1431).
- Nomenclature change: drop the term "dynamic file" in favor of "file target".
- Internally choose a default level separation in the
visNetwork
graph based on the number of hierarchical levels and the maximum number of vertices per level (#1432). - In
tar_visnetwork()
, choose the colors of the edges based on the origin vertices, not the destination vertices (#1433). - In the
"verbose"
and"timestamp"
reporters, print "dispatched pattern" messages, and print the total computation and storage summed over all the branches. - Create a new
"balanced"
reporter with acli
progress bar (#1442). - Deprecate reporters
"forecast"
,"forecast_interactive"
,"verbose_positives"
, and"timestamp_positives"
(#1442). - Ensure colors printed to the console are preserved when forwarded from the
callr
process (#1442). - Add
tar_option_with()
(ropensci/tarchetypes#215, @noamross). - Use
prettyunits
to print elapsed times and file sizes. - Shorten and simplify the
tar_make()
error message. - Minor bugfix: add a new
on_worker
argument totarget_run()
andbuilder_unload_value()
so the latter only removes the target value if the target was actually run on a worker.
Migrate to {crew} 1.0.0
targets 1.10.1
- Restore explicit references to "self" in
R6
classes. - Perform
crew
task retries. - Try to handle
NA
buckets instore_delete_objects.tar_aws()
andstore_delete_objects.tar_gcp()
.
Speed gains for large pipelines (with many up-to-date targets)
targets 1.10.0
Invalidating changes
These changes invalidate certain targets in a pipeline and cause them to rerun on the next tar_make()
.
- Exclude function signatures from
tar_repository_cas()
output strings to reduce the size of pipeline metadata (#1390). - Exclude function signatures from
tar_format()
output strings to reduce the size of pipeline metadata (#1390).
Summary of performance gains
tar_make()
and tar_outdated()
run much faster in this release. Extensive profiling was done on a real-world simulation pipeline with 66002 up-to-date targets. For tar_make()
using all the default settings:
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 413.16 | 35.538 | 11.62587 |
RHEL9 | 450.66 | 94.08 | 4.790 |
And for tar_outdated()
using all the default settings
Machine | Before (seconds) | After (seconds) | Speedup |
---|---|---|---|
M2 Macbook | 91.314 | 16.636 | 5.48894 |
RHEL9 | 167.809 | 37.395 | 4.487472 |
To take advantage of these speed gains for an existing pipeline, you may have to run tar_make()
to convert the time stamps and file sizes to a new format. This initial tar_make()
is slow, but subsequent tar_make()
calls should be much faster than before the upgrade.
Other/specific changes
- Speed up
tar_make()
andtar_outdated()
by avoiding excessive buffering and disk writes for metadata and reporters when the pipeline is just skipping targets. - Use a more lookup-efficient data structure for
tar_runtime$file_info
(#1398). - Fall back on vector aggregation without names (#1401, @guglicap).
- Speed up representation of file sizes in metadata (#1408).
- Add a new
"forecast_interactive"
reporter totar_outdated()
to choose"forecast"
for interactive sessions and"silent"
for non-interactive ones. - Add a new
seconds_reporter_outdated
argument totar_config_set()
with a default of 1 to control the time interval of the reporter oftar_outdated()
and other passive algorithm functions. - Remove target descriptions from the default labels of graph visualizations.
igraph compatibility
targets 1.9.1
Bug fixes
- Allow branch references to contain multi-element
path
vectors with cloud metadata (#1382, @n8layman). - Avoid partial matches in internal code (#1384, @olivroy).
- Add error handling around calls to
ps::ps_disk_partitions()
andps::ps_fs_mount_point()
. - Do not store
_targets/objects/
paths in metadata for CAS repositories (#1391).
Compatibility
- Ensure compatibility with
igraph
>= 2.1.2.
Memory efficiency
targets 1.9.0
Improvements
- Un-break workflows that use
format = "file_fast"
(#1339, @koefoeden). - Fix deadlock in
error = "trim"
(#1340, @koefoeden). - Remove tailored debugging message (#1341, @koefoeden).
- Store warnings while writing to storage (#1345, @Aariq).
- Allow
garbage_collection
to be a non-negative integer to control the frequency of garbage collection in a performant, convenient, unified way (#1351). - Deprecate the
garbage_collection
argument oftar_make()
,tar_make_future()
, andtar_make_clusterm()
(#1351). - Instrument
target_run()
,target_prepare()
, andtarget_conclude()
usingautometric
. - Avoid sending problematic error classes such as
"vctrs_error_subscript_oob"
torlang::abort()
(#1354, @Jiefei-Wang). - Reduce memory consumption by ~23% in large pipelines by avoiding the accumulation of promise objects (#1352).
- Avoid
store_assert_format()
andstore_convert_object()
isstorage
is"none"
. - Add a
list()
method totar_repository_cas()
to make it easier and more efficient to specify custom CAS repositories (#1366). - Improve speed and reduce memory consumption by avoiding deep copies of inner environments of target definition objects (#1368).
- Reduce memory consumption by storing buds and branches as lightweight references when
memory
is"transient"
(#1364). - Replace the
memory
class with the newlookup
class. - Implement
memory = "auto"
to select transient memory for dynamic branches and persistent memory for other targets (#1371). - Omit whole pattern targets from branch subpipelines when possible. Should reduce memory consumption in some cases.
- Omit whole stem targets from branch subpipelines when
retrieval
is"main"
and only a bud is actually used. The same cannot be done with branches because each branch may need to be (un)marshaled individually. - Compress branches into references when
retrieval
is"worker"
and the whole pattern is part of the subpipeline. - Avoid duplicated branch aggregation: just send the branches over the network.
- Back-compatibly switch
format = "qs"
fromqs
toqs2
(#1373). - Add
tar_unblock_process()
.
Potentially invalidating changes
- Add
"keepNA"
and"keepInteger"
to.deparseOpts()
(#1375). This may cause existing pipelines to rerun, but it makes add-ons liketarchetypes::tar_map()
much easier to use.
Content addressable storage
targets 1.8.0
- Wrap
tar_watch()
UI module inbslib::page()
(#1302, @kwbyron-lilly). - Remove
callr_function
intar_make_as_job()
argument list. - Ensure
storage = "worker"
is respected when the process of storing an object generates an error (#1304, @multimeric). - Default to the
_targets.R
pattern intar_branches()
(#1306, @multimeric, @mattwarkentin). - Remove superfluous functions and globals from metadata with
tar_prune()
(#1312, @benzipperer). - Change the default
workspace_on_error
option toTRUE
(#1310, @hadley). - Enhance and organize the
error = "stop"
error message. - Avoid saving a file in
_targets/objects
forerror = "null"
. Instead, switch to a special"null"
storage format class iferror
is"null"
the target throws an error. This should allow users to more freely create new formats withtar_format()
without worrying about how to handleNULL
objects created byerror = "null"
. - Implement
format = "auto"
(#1311, @hadley). - Replace
pingr
dependency withbase::socketConnection()
for local URL utilities (#1317, #1318, @Adafede). - Implement
tar_repository_cas()
,tar_repository_cas_local()
, andtar_repository_cas_local_gc()
for content-addressable storage (#1232, #1314, @noamross). - Add
tar_format_get()
to make implementing CAS systems easier. - Implement
error = "trim"
intar_target()
andtar_option_set()
(#1310, #1311, @hadley). - Use the file system type to decide whether to trust time stamps (#1315, @hadley, @gaborcsardi).
- Deprecate
format = "file_fast"
in favor of the above (#1315). - Deprecate
trust_object_timestamps
in favor of the more unifiedtrust_timestamps
intar_option_set()
(#1315). - Print storage size of each target in verbose reporters (#1337, @psychelzh).
- Combine help files of
tar_target()
andtar_target_raw()
. Same withtar_load()
andtar_load_raw()
. - Add a
substitute
argument totar_format()
to make it easier to write custom storage formats without metaprogramming.
bslib and speed
targets 1.7.1
- Use
bslib
intar_watch()
. - Speed up
target_upstream_edges()
andpipeline_upstream_edges()
by avoiding data frames until the last minute (17% speedup for certain kinds of large pipelines). - Automatically set
as_job
toFALSE
intar_make()
ifrstudioapi
and/or RStudio is not available.