Skip to content

Make downloads resumable across app sessions #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

DePasqualeOrg
Copy link
Contributor

This PR adds support for resumable downloads that persist across app sessions. When a download is interrupted, it will resume from where it left off when the app is restarted and the download is initiated. The implementation uses stable file hashing, UserDefaults for state persistence, and HTTP Range requests to efficiently download only the missing portions of files.

@FL33TW00D
Copy link
Collaborator

@ardaatahan given that I know you guys are using this functionality, do you have any thoughts?

@DePasqualeOrg DePasqualeOrg force-pushed the improve-resumable-downloads branch from 90a189d to 988ee59 Compare March 23, 2025 10:33
@DePasqualeOrg
Copy link
Contributor Author

I've resolved the conflicts from the latest merged PR. @FL33TW00D, @pcuenca, do you have any thoughts on this?

The lack of resumable downloads across app sessions is one of the biggest pain points for users of my app Local Chat, and I imagine the same is true for other apps that use swift-transformers. Multi-gigabyte downloads often take several minutes to complete, and if the app goes to the background or is terminated, the download needs to start over again from zero. This solution uses UserDefaults to store data on downloads in progress. Do think this behavior should be optional, or is it acceptable for all users of swift-transformers?

@ardaatahan
Copy link
Contributor

ardaatahan commented Mar 24, 2025

Great work, @DePasqualeOrg! This is a very valuable addition to the repo.

I'd like to suggest considering an approach similar to what the Hugging Face Hub library uses, which has a well-tested implementation for resumable downloads. This approach supports downloading files from where they left off when the app is restarted and the download is reinitiated (cc: @FL33TW00D, @pcuenca):

  1. Create temporary files with an .incomplete suffix during downloads under .cache/huggingface/download
  2. When resuming, detect these files, calculate how much data was already downloaded, and use HTTP Range requests to fetch only the remaining data
  3. Once the download completes, rename/move the file to its final location

This approach is quite elegant and has several advantages:

  1. No need to store persistent state in UserDefaults since the incomplete file itself serves as the state
  2. Resilient to app crashes or unexpected terminations
  3. Works without requiring complex state tracking

Here are some relevant links to the huggingface_hub implementation:

Let me know what you all think, and don't hesitate to reach out if I can help with anything!

@DePasqualeOrg DePasqualeOrg force-pushed the improve-resumable-downloads branch from 988ee59 to 77df21d Compare March 25, 2025 09:17
@DePasqualeOrg
Copy link
Contributor Author

Thanks for your feedback, @ardaatahan! That approach makes a lot more sense, and I should have checked how this is implemented in Python first. I've revised my implementation and am saving the .incomplete files in the target directory. They're renamed once the download is complete. Do you think it's important to store the incomplete files in a separate directory?

@DePasqualeOrg DePasqualeOrg force-pushed the improve-resumable-downloads branch 4 times, most recently from 72958ed to a5a5875 Compare March 25, 2025 16:10
@ardaatahan
Copy link
Contributor

@DePasqualeOrg I'd recommend keeping .incomplete files in a separate cache directory (.cache/huggingface/download) rather than the target directory. This follows good separation of concerns - temporary downloads and their metadata belong in the cache, while only complete, valid files should exist in the target directory. The Hugging Face implementation in _download_to_tmp_and_move uses this pattern specifically to maintain this clean separation.

@DePasqualeOrg DePasqualeOrg force-pushed the improve-resumable-downloads branch from a5a5875 to 2b9b873 Compare March 25, 2025 16:47
@DePasqualeOrg
Copy link
Contributor Author

Maintaining a parallel directory structure inside a download directory introduces a lot of extra complexity. I just spent a couple hours trying to get that to work and didn't fully succeed, so this is the solution that I'm going to offer. Incomplete downloads are saved with an .incomplete suffix in the target directory. In fact, I believe one of the download methods in the Python library does something similar.

If someone wants to add functionality for a separate download directory, they're welcome to do so, but this is already a huge improvement on the current behavior, which doesn't support resuming downloads across app sessions.

@ardaatahan
Copy link
Contributor

This is indeed a huge improvement over the current behavior, and I appreciate the work you've put into making resumable downloads better!

I do have reservations about storing .incomplete files in target directories though. This approach carries some risks - applications that iterate through directories might encounter errors with partial files, and there could be unintended side effects if these incomplete files are processed as if they were complete. Could you share more about the specific issues you encountered? I'd be happy to help you solve those issues and implement the separate cache directory approach. The complexity is worth it for the safeguards it provides.

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Mar 25, 2025

The same directory structure needs to be maintained in the download directory. Some repos have nested directory structures. You'll end up with empty directories in the download directory after downloads complete. If you don't want to leave those empty directories there, you'll need to figure out how to delete them without interfering with other downloads.

I think it's unlikely that users are iterating through files in the model directory without checking file extensions. The .incomplete suffix should avoid the (as far as I know, entirely theoretical) problem case that you mentioned. Otherwise, users can adapt their application to be more robust.

I think this solution will work fine for the vast majority of users of swift-transformers. I'll let the maintainers of this package decide whether the more complex solution you suggested is necessary, and if so, someone else can implement it. This is the solution that I'm offering, which can either be merged as is or built upon.

@DePasqualeOrg
Copy link
Contributor Author

@pcuenca, @FL33TW00D, I suggest that if no one wants to implement the more complex solution that @ardaatahan suggested, we review and merge this soon, since it solves one of the biggest pain points that users of this library face.

@ardaatahan
Copy link
Contributor

@DePasqualeOrg, I'm planning to work on this throughout the weekend to implement the incomplete download mechanism I mentioned, I'll keep you posted.

@pcuenca
Copy link
Member

pcuenca commented Mar 28, 2025

@ardaatahan Thanks a lot, but no need to work on the weekend in my opinion! Happy to help whenever you get started 🤗

@ardaatahan
Copy link
Contributor

@DePasqualeOrg, @FL33TW00D, @pcuenca, I'm done adding the Hugging Face Hub style incomplete download mechanism. I'll do some more testing and cleanup the code and the PR should be ready before end of this week!

@ardaatahan
Copy link
Contributor

@DePasqualeOrg FYI — I opened a PR to your local branch with changes that improve incomplete download handling. Could you give it a review?

@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Apr 16, 2025

I'm seeing some weird behavior in my initial tests. I've tried downloading multiple models at the same time as well as interrupting the download before it completes and resuming after relaunching the app. I'm not always seeing the files I expect to see in the temporary download folder. I don't have time to test this in depth, so I'm going to leave it to @pcuenca and others. I'm personally happy with the solution that I offered.

@DePasqualeOrg DePasqualeOrg force-pushed the improve-resumable-downloads branch 2 times, most recently from 92303cb to 4b6ebd4 Compare April 16, 2025 19:20
@DePasqualeOrg
Copy link
Contributor Author

DePasqualeOrg commented Apr 16, 2025

Also, it looks like you're not cleaning up the parent directories of the temporary files, so you end up with a lot of empty directories in the download directory. That's the problem that I mentioned above. It would be difficult to figure out which directories to clean up without interfering with other downloads, but my solution is cleaner, since the temporary files are downloaded to their destination.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants