Skip to content

Update real world example in quick start documentation #5325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 25, 2025

Conversation

stxue1
Copy link
Contributor

@stxue1 stxue1 commented May 30, 2025

Closes #3534

Changelog Entry

To be copied to the draft changelog by merger:

  • toil-wdl-runner will bump container memory if below 4MiB
  • Updated real world example in quick start
  • --importWorkersThreshold has been renamed to --importWorkersBatchsize

Reviewer Checklist

  • Make sure it is coming from issues/XXXX-fix-the-thing in the Toil repo, or from an external repo.
    • If it is coming from an external repo, make sure to pull it in for CI with:
      contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
      
    • If there is no associated issue, create one.
  • Read through the code changes. Make sure that it doesn't have:
    • Addition of trailing whitespace.
    • New variable or member names in camelCase that want to be in snake_case.
    • New functions without type hints.
    • New functions or classes without informative docstrings.
    • Changes to semantics not reflected in the relevant docstrings.
    • New or changed command line options for Toil workflows that are not reflected in docs/running/{cliOptions,cwl,wdl}.rst
    • New features without tests.
  • Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
  • Finish the review with an overall description of your opinion.

Merger Checklist

  • Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
  • Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
  • Merge with the Github "Squash and merge" feature.
    • If there are multiple authors' commits, add Co-authored-by to give credit to all contributing authors.
  • Copy its recommended changelog entry to the Draft Changelog.
  • Append the issue number in parentheses to the changelog entry.

@stxue1
Copy link
Contributor Author

stxue1 commented May 30, 2025

Also closes #5316

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good but the outdated option is still in the new docs.

@@ -3800,6 +3800,8 @@ def run(self, file_store: AbstractFileStore) -> Promised[WDLBindings]:
"is not yet implemented in the MiniWDL Docker "
"containerization implementation."
)
if runtime_bindings.has_binding("memory") and human2bytes(runtime_bindings.resolve("memory").value) < 4_194_304:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might write the threshold as human2bytes("4MiB")

Comment on lines 160 to 170
First, grab the `workflow from Dockstore<https://dockstore.org/workflows/github.com/broadinstitute/gatk/MitochondriaPipeline:master?tab=info>`_::

(venv) $ wget https://dockstore.org/api/workflows/8801/zip/20321 -O MitochondriaExample.zip && unzip MitochondriaExample.zip

Then grab an example workflow input::

(venv) $ wget https://toil-datasets.s3.us-west-2.amazonaws.com/MitochondriaInputs.zip && unzip MitochondriaInputs.zip

Move the input files into the scripts directory and change your current working directory to that directory::

(venv) $ mv MitochondriaInputs/* scripts/mitochondria_m2_wdl/ && cd scripts/mitochondria_m2_wdl/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cooler if we could run this from Dockstore by TRS ID and version. I guess we need to download for the inputs?

``--runImportsOnWorkers``: Run file imports on workers. This is useful if the leader is not network optimized
and lots of downloads are necessary. By default, this is false.

``--importWorkersThreshold``: Requires ``--runImportsOnWorkers`` to be true. Specify the target batch size in bytes for batched imports.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be `--importWorkersBatchSize`` now.

Comment on lines 101 to 103
``--importWorkersDisk``: Requires ``--runImportsOnWorkers`` to be true. Specify the disk size each import worker will get.
This may be necessary when file streaming is not possible. For example, downloading from AWS to a GCE job store.
If specified, this should be set to the largest file size of all files to import. By default, this is 1 MiB.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the user won't have to set this at all, because Toil will be able to read the file size from the source and allocate enough disk. The user only has to touch this when that isn't possible. The documentation should reflect that and should be about increasing the disk allocation for downloads that can't work otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Toil allocates the right amount of disk, it's more that Toil will use file streaming to get around needing much disk space at all. Only when an actual file download/transfer needs to take place will this need to be set. The filesize sniffing right now only separates out the file imports into batches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, probably we should make some attempt to allocate the right amount of disk at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue then is having some way to figure out if file streaming is supported or not. I had a previous implementation that would attempt to download on a worker with almost no disk space, then catch a crash and allocate the right amount of disk space, but I think we removed it since it was too hacky. This may also still be needed as an argument if we can't sniff file size out of servers.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good now.

@adamnovak
Copy link
Member

It turns out this fails one of the WDL spec unit tests:

84: FAILED: Unit test for test_round.wdl
Iteration: 1
REASON: Expected and result do not match!
Expected output: False
Actual output: True!

Maybe MiniWDL doesn't quite pass it?

@adamnovak
Copy link
Member

It looks like the test is here:
https://github.com/openwdl/wdl/blob/d5fb794e52d1e75075dde3ca4a05e37c90108977/SPEC.md?plain=1#L6308-L6347

And the possibly relevant MiniWDL 1.13 commit is here: chanzuckerberg/miniwdl@7da1cc0

@stxue1
Copy link
Contributor Author

stxue1 commented Jun 18, 2025

It turns out this fails one of the WDL spec unit tests:

84: FAILED: Unit test for test_round.wdl
Iteration: 1
REASON: Expected and result do not match!
Expected output: False
Actual output: True!

Maybe MiniWDL doesn't quite pass it?

The test passes with MiniWDL 1.13.0 but not 1.12.1

This was referenced Jun 18, 2025
@adamnovak
Copy link
Member

We might actually want to show a copy of the error we get without the --quantCheck=False flag. Because the error doesn't suggest to use the flag (maybe we need to actually parse the errors and suggest the flag), so we'd want the documentation to come up in search when people search the error.

Copy link
Member

@adamnovak adamnovak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work. I am a little worried removing all the edge case whitespace tests opens us up to breakage if MiniWDL changes again, but it might not be worth porting them all to e.g. conformance tests.

@@ -1121,122 +1120,7 @@ def make_string_expr(self, to_parse: str) -> WDL.Expr.String:
for i in range(1, len(parts), 2):
parts[i] = WDL.Expr.Placeholder(pos, {}, WDL.Expr.Null(pos))

return WDL.Expr.String(pos, parts)

def test_remove_common_leading_whitespace(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're removing these test that exercise a lot of edge cases, are we sure we have enough tests for the overall behavior (i.e. in workflows in the conformance tests or in Toil) that we'll catch breakage if MiniWDL changes its behavior and stops following the spec again?

@adamnovak adamnovak merged commit d2a50af into master Jun 25, 2025
3 checks passed
@adamnovak adamnovak deleted the issues/3534-real-world-example branch June 25, 2025 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automatically increase Docker container memory limit when below Docker resource constraints Real World Example Tutorial is not a Real World Example.
2 participants