Merged
Conversation
for more information, see https://pre-commit.ci
…and redownload files
Collaborator
Author
|
Kicking CI |
Collaborator
Author
|
Kicking CI |
Collaborator
Author
|
Kicking CI |
Collaborator
Author
@JasonWeill updated "Submit the Create Job form" section of the Users page with a screenshot and text mentioning "Run job with input folder" option (code, readthedocs preview). I tried different options like adding an example "Use this to, for example, access data files or images from notebook's cells." But ultimately what I have in the PR now works well and matches level of detail given in the paragraphs around it, would be interested to learn your opinion on this. |
JasonWeill
reviewed
Apr 26, 2024
Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com>
for more information, see https://pre-commit.ci
JasonWeill
reviewed
Apr 26, 2024
Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com>
Collaborator
Author
|
Kicking CI |
JasonWeill
approved these changes
Apr 26, 2024
Collaborator
JasonWeill
left a comment
There was a problem hiding this comment.
Change looks good! Thanks for the great work on this.
andrii-i
added a commit
to andrii-i/jupyter-scheduler
that referenced
this pull request
Apr 29, 2024
…ger) (jupyter-server#510) * package input files and folders (backend) * package input files and folders (frontend) * remove "input_dir" from staging_paths dict * ensure execution context matches the notebook directory * update snapshots * copy staging folder to output folder after job runs (SUCESS or FAILURE) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy staging folder and side effects to output after job runs, track and redownload files * remove staging to output copying logic from executor * refactor output files creation logic into a separate function for clarity * Fix job definition data model * add packaged_files to JobDefinition and DescribeJobDefinition model * fix existing pytests * clarify FilesDirectoryLink title * Dynamically display input folder in the checkbox text * display packageInputFolder parameter as 'Files included' * use helper text with input directory for 'include files' checkbox * Update Playwright Snapshots * add test side effects accountability test for execution manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use "Run job with input folder" for packageInputFolder checkbox text * Update Playwright Snapshots * Use "Ran with input folder" in detail page * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * fix lint error * Update Playwright Snapshots * Update existing screenshots * Update "Submit the Create Job" section mentioning “Run job with input folder” option * Update docs/users/index.md Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * Update Playwright Snapshots * Describe side effects behavior better --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com>
This was referenced Apr 29, 2024
andrii-i
added a commit
that referenced
this pull request
Apr 30, 2024
…adManager) (#510) (#512) * Package input files (no autodownload, no multiprocessing DownloadManager) (#510) * package input files and folders (backend) * package input files and folders (frontend) * remove "input_dir" from staging_paths dict * ensure execution context matches the notebook directory * update snapshots * copy staging folder to output folder after job runs (SUCESS or FAILURE) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * copy staging folder and side effects to output after job runs, track and redownload files * remove staging to output copying logic from executor * refactor output files creation logic into a separate function for clarity * Fix job definition data model * add packaged_files to JobDefinition and DescribeJobDefinition model * fix existing pytests * clarify FilesDirectoryLink title * Dynamically display input folder in the checkbox text * display packageInputFolder parameter as 'Files included' * use helper text with input directory for 'include files' checkbox * Update Playwright Snapshots * add test side effects accountability test for execution manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use "Run job with input folder" for packageInputFolder checkbox text * Update Playwright Snapshots * Use "Ran with input folder" in detail page * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * fix lint error * Update Playwright Snapshots * Update existing screenshots * Update "Submit the Create Job" section mentioning “Run job with input folder” option * Update docs/users/index.md Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update src/components/input-folder-checkbox.tsx Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * Update Playwright Snapshots * Describe side effects behavior better --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com> * Update Playwright snapshots --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Jason Weill <93281816+JasonWeill@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When working with notebooks, users frequently import and use additional files such as datasets, images, and scripts inside their notebook's cells. Providing support for packaging such files would ensure that notebooks can have all essential resources available when executed as a job. This would make Jupyter Scheduler more flexible and able to accommodate more types or workflows providing better value to people who use it.
This PR adds an option to package input folder (folder where input notebook is located) and all nested files and sub-folders within it during the job or job definition creation.
In terms of features, this is a subset of PR #500. This PR does not automatically download output files to output folder when job runs and therefore has no need need to schedule downloads from multiple processes and components that would manage it (
DownloadRunnerandDownloadManagerfrom #500). Besides making this PR more focused in terms of functionality, this makes changes introduced by this PR non-breaking.When package input folder option is active:
Job.packaged_files, if any of them is deleted from output folder, user gets an option to re-downloaded them via UI (matches existing behavior for snapshot of the input notebook and output files)JobDefinition.packaged_filescwdparameter of theExecutePreprocessor). Additionally pass intended path of execution context to preprocessor via metadata{"metadata": {"path": notebook_dir}}argument of the preprocessor call.Job.packaged_filesand copied to the output folder together with other filesFixes #407
Before:


After:

Re-download option when any of the files is deleted from the output folder:
