-
Notifications
You must be signed in to change notification settings - Fork 243
Update real world example in quick start documentation #5325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also closes #5316 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good but the outdated option is still in the new docs.
src/toil/wdl/wdltoil.py
Outdated
@@ -3800,6 +3800,8 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]: | |||
"is not yet implemented in the MiniWDL Docker " | |||
"containerization implementation." | |||
) | |||
if runtime_bindings.has_binding("memory") and human2bytes(runtime_bindings.resolve("memory").value) < 4_194_304: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might write the threshold as human2bytes("4MiB")
docs/gettingStarted/quickStart.rst
Outdated
First, grab the `workflow from Dockstore<https://dockstore.org/workflows/github.com/broadinstitute/gatk/MitochondriaPipeline:master?tab=info>`_:: | ||
|
||
(venv) $ wget https://dockstore.org/api/workflows/8801/zip/20321 -O MitochondriaExample.zip && unzip MitochondriaExample.zip | ||
|
||
Then grab an example workflow input:: | ||
|
||
(venv) $ wget https://toil-datasets.s3.us-west-2.amazonaws.com/MitochondriaInputs.zip && unzip MitochondriaInputs.zip | ||
|
||
Move the input files into the scripts directory and change your current working directory to that directory:: | ||
|
||
(venv) $ mv MitochondriaInputs/* scripts/mitochondria_m2_wdl/ && cd scripts/mitochondria_m2_wdl/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cooler if we could run this from Dockstore by TRS ID and version. I guess we need to download for the inputs?
docs/wdl/running.rst
Outdated
``--runImportsOnWorkers``: Run file imports on workers. This is useful if the leader is not network optimized | ||
and lots of downloads are necessary. By default, this is false. | ||
|
||
``--importWorkersThreshold``: Requires ``--runImportsOnWorkers`` to be true. Specify the target batch size in bytes for batched imports. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be `--importWorkersBatchSize`` now.
docs/wdl/running.rst
Outdated
``--importWorkersDisk``: Requires ``--runImportsOnWorkers`` to be true. Specify the disk size each import worker will get. | ||
This may be necessary when file streaming is not possible. For example, downloading from AWS to a GCE job store. | ||
If specified, this should be set to the largest file size of all files to import. By default, this is 1 MiB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the user won't have to set this at all, because Toil will be able to read the file size from the source and allocate enough disk. The user only has to touch this when that isn't possible. The documentation should reflect that and should be about increasing the disk allocation for downloads that can't work otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Toil allocates the right amount of disk, it's more that Toil will use file streaming to get around needing much disk space at all. Only when an actual file download/transfer needs to take place will this need to be set. The filesize sniffing right now only separates out the file imports into batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, probably we should make some attempt to allocate the right amount of disk at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue then is having some way to figure out if file streaming is supported or not. I had a previous implementation that would attempt to download on a worker with almost no disk space, then catch a crash and allocate the right amount of disk space, but I think we removed it since it was too hacky. This may also still be needed as an argument if we can't sniff file size out of servers.
… use dockstore TRS ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good now.
It turns out this fails one of the WDL spec unit tests:
Maybe MiniWDL doesn't quite pass it? |
It looks like the test is here: And the possibly relevant MiniWDL 1.13 commit is here: chanzuckerberg/miniwdl@7da1cc0 |
openwdl/wdl#715 and @stxue1's fixes have all been merged by now so the mainline WDL unitt ests should work.
The test passes with MiniWDL 1.13.0 but not 1.12.1 |
We might actually want to show a copy of the error we get without the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will work. I am a little worried removing all the edge case whitespace tests opens us up to breakage if MiniWDL changes again, but it might not be worth porting them all to e.g. conformance tests.
@@ -1121,122 +1120,7 @@ def make_string_expr(self, to_parse: str) -> WDL.Expr.String: | |||
for i in range(1, len(parts), 2): | |||
parts[i] = WDL.Expr.Placeholder(pos, {}, WDL.Expr.Null(pos)) | |||
|
|||
return WDL.Expr.String(pos, parts) | |||
|
|||
def test_remove_common_leading_whitespace(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're removing these test that exercise a lot of edge cases, are we sure we have enough tests for the overall behavior (i.e. in workflows in the conformance tests or in Toil) that we'll catch breakage if MiniWDL changes its behavior and stops following the spec again?
Closes #3534
Changelog Entry
To be copied to the draft changelog by merger:
--importWorkersThreshold
has been renamed to--importWorkersBatchsize
Reviewer Checklist
issues/XXXX-fix-the-thing
in the Toil repo, or from an external repo.camelCase
that want to be insnake_case
.docs/running/{cliOptions,cwl,wdl}.rst
Merger Checklist