Skip to content

Retry and resume functionality for downloader#17

Merged
mtauraso merged 2 commits intomainfrom
downloader-retry
Aug 14, 2024
Merged

Retry and resume functionality for downloader#17
mtauraso merged 2 commits intomainfrom
downloader-retry

Conversation

@mtauraso
Copy link
Copy Markdown
Collaborator

Implementation of retry and resume within downloadCutout.py.

Retry means that when a request fails we will try again (defaults to 3 attempts). This is intended to address connection drops and the like. We use configurable exponential backoff to avoid a thundering herd if load from our client is causing some backend failure.

Resume describes the situation where the download fails for unrecoverable reasons (HSC infra goes down) or is terminated (e.g. downloads only occur at night). This generates a resume_download.toml file in the download directory, which allows the exact same download to resume from the chunk that was in progress when the interruption occurred. This resume functionality is off by default to preserve the download() interface used by downloadCutout.py's CLI which does not support resume.

- Resume has been tested
- Retry has not been tested
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 13, 2024

Before [b66d834] After [7aaee8d] Ratio Benchmark (Parameter)
4.12±1s 1.88±1s ~0.46 benchmarks.time_computation
96 2.39k 24.92 benchmarks.mem_list

Click here to view all benchmarks.

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 13, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 39.45%. Comparing base (b66d834) to head (fff2c17).
Report is 2 commits behind head on main.

Files Patch % Lines
src/fibad/download.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #17   +/-   ##
=======================================
  Coverage   39.45%   39.45%           
=======================================
  Files          10       10           
  Lines         185      185           
=======================================
  Hits           73       73           
  Misses        112      112           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mtauraso mtauraso self-assigned this Aug 13, 2024
@mtauraso mtauraso changed the title Initial Downloader with rety and resume functionality. Retry and resume functionality for downloader Aug 13, 2024
@mtauraso mtauraso marked this pull request as ready for review August 13, 2024 20:51
Copy link
Copy Markdown
Collaborator

@aritraghsh09 aritraghsh09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, and I have no objections. But I have two comments:-

  1. The files that fail even after the specified number of attempts -- can we keep a running list of those object ids somewhere and then dump them as a .npy array or something similar?

  2. Alternatively, I guess a separate check-script can be written that verifies whether an object_id.fits exists for each object_id in the download table; and then makes an array of all the object_ids for which image files were not found.

"""
# Load resume data so we start at the appropriate chunk.
if not os.path.exists(resume_data_filename):
return 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this a RuntimeError before? Seems like it makes sense for it to be a raised exception, but curious if there's a good reason for it to be return 0.

Copy link
Copy Markdown
Collaborator Author

@mtauraso mtauraso Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the reason for this is so if we're called with resume=True but there is no resume data, we will just download from the beginning.

This avoids a CLI flag (or other mechanism) in download.py or higher which has to know whether the user intends to resume or not.

Maybe resume should really be called resume_if_possible since that is what it really means.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok, I see what you mean here. I don't think I have a strong opinion here other than perhaps cleaning up the docstring and perhaps leaving a comment like "can't find the file, so starting from index 0" or something.

Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. Only one little comment about return 0 vs. raise Exception.

@mtauraso mtauraso merged commit 7bfb7db into main Aug 14, 2024
@mtauraso mtauraso deleted the downloader-retry branch August 14, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants