Skip to content

Use hf transfer as default #2046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 22, 2024
Merged

Conversation

felipemello1
Copy link
Contributor

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)
pip install huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 tune download <your model>

For llama 8b: from 2m12s down to 32s.

doc: https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads

Changelog

  • Had to add it to init, because adding it to /cli/download.py wouldn't work.
  • User can still disable it by running HF_HUB_ENABLE_HF_TRANSFER=0 tune download <model_config>

Test plan

pip installed torchtune
HF_HUB_ENABLE_HF_TRANSFER=0 tune download <model_config> --> runs without transfer
tune download <model_config> --> runs with transfer
pip uninstall hf_transfer
tune download <model_config> --> runs without transfer

Copy link

pytorch-bot bot commented Nov 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2046

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 3992c79 with merge base e9fd56a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2024

import hf_transfer # noqa

if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this break if the environ variable is not present?

Suggested change
if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") is None:
if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER", None) is None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i printed it. Doing os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") returns None already. So i guess returning None is already the default

@RdoubleA
Copy link
Collaborator

It all looks good to me. I'm just curious if you know how stable this is and if we can reliably tell users to turn it off if it ever runs into issues (ex: huggingface/hf_transfer#30, is it immediately obviously that we need to turn off Hf transfer). Also, do we still get the same speed up if you run on a laptop vs a server?

@felipemello1
Copy link
Contributor Author

It all looks good to me. I'm just curious if you know how stable this is and if we can reliably tell users to turn it off if it ever runs into issues (ex: huggingface/hf_transfer#30, is it immediately obviously that we need to turn off Hf transfer). Also, do we still get the same speed up if you run on a laptop vs a server?

Good question! It passes our tests to check for hf_token and gated models. In their website also make it clear that it is not an experimental feature. I also tested it in multiple scenarios, e.g. lib installed/not installed, with/without flag. I think that there is enough evidence that it is safe, but we should keep an eye.

image

@joecummings joecummings changed the title use hf transfer as default Use hf transfer as default Nov 22, 2024
@joecummings joecummings merged commit a9aadf5 into pytorch:main Nov 22, 2024
17 checks passed
@ebsmothers ebsmothers mentioned this pull request Nov 26, 2024
44 tasks
@KLL535
Copy link

KLL535 commented Dec 14, 2024

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

@winglian
Copy link
Contributor

@KLL535 simply use export HF_HUB_ENABLE_HF_TRANSFER=0 from bash

@KLL535
Copy link

KLL535 commented Dec 14, 2024

@KLL535 simply use export HF_HUB_ENABLE_HF_TRANSFER=0 from bash

I tried, it doesn't help.
I tried pip uninstall hf-transfer. After which the program writes that it can't find hf-transfer and refuses to work further.
More about my suffering here:
cocktailpeanut/fluxgym#264
cocktailpeanut/fluxgym#46
cocktailpeanut/fluxgym#115

For now, the only way is to download the file in the browser and put it in the right places in the program

@felipemello1
Copy link
Contributor Author

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

@KLL535 at least in torchtune, this is the code:

so if you either:

  1. dont have _hf_transfer
  2. set the environment variable

we wont use hf_transfer for you.

Can you share a bit of the problem that you are facing? Is it when using torchtune, or some other library?

@KLL535
Copy link

KLL535 commented Dec 15, 2024

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

@KLL535 at least in torchtune, this is the code:

so if you either:

  1. dont have _hf_transfer
  2. set the environment variable

we wont use hf_transfer for you.

Can you share a bit of the problem that you are facing? Is it when using torchtune, or some other library?

Problem with another library fluxgym. Not only me. I posted links in the previous comment.
File 1.54G cannot be downloaded. Never. However, the file can be downloaded from the site by browser without problems.

Usually the download just freeze on a random %.
But sometimes, rarely, it can crash with this text, from which you can perhaps draw some conclusions about the components involved in the download:

pytorch_model.bin:   4%|██▍                                                        | 62.9M/1.54G [01:27<34:25, 717kB/s]
Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 523, in http_get
    hf_transfer.download(
Exception: Error while removing corrupted file:  (os error 32)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\modeling_utils.py", line 3557, in from_pretrained
    resolved_archive_file = cached_file(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\utils\hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\utils\_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1389, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1915, in _download_to_tmp_and_move
    http_get(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 534, in http_get
    raise RuntimeError(
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\blocks.py", line 1532, in call_function
    prediction = await utils.async_iteration(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2505, in run_sync_in_worker_thread
    return await future
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 1005, in run
    result = context.run(func, *args)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\app.py", line 278, in run_captioning
    model = AutoModelForCausalLM.from_pretrained(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\modeling_utils.py", line 3644, in from_pretrained
    raise EnvironmentError(
OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants