Use hf transfer as default #2046

felipemello1 · 2024-11-21T20:07:24Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

pip install huggingface_hub[hf_transfer]
HF_HUB_ENABLE_HF_TRANSFER=1 tune download <your model>

For llama 8b: from 2m12s down to 32s.

doc: https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads

Changelog

Had to add it to init, because adding it to /cli/download.py wouldn't work.
User can still disable it by running HF_HUB_ENABLE_HF_TRANSFER=0 tune download <model_config>

Test plan

pip installed torchtune
HF_HUB_ENABLE_HF_TRANSFER=0 tune download <model_config> --> runs without transfer
tune download <model_config> --> runs with transfer
pip uninstall hf_transfer
tune download <model_config> --> runs without transfer

pytorch-bot · 2024-11-21T20:07:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2046

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit 3992c79 with merge base e9fd56a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RdoubleA · 2024-11-21T21:14:50Z

torchtune/__init__.py

+
+    import hf_transfer  # noqa
+
+    if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") is None:


does this break if the environ variable is not present?

Suggested change

if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") is None:

if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER", None) is None:

i printed it. Doing os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") returns None already. So i guess returning None is already the default

RdoubleA · 2024-11-21T21:36:56Z

It all looks good to me. I'm just curious if you know how stable this is and if we can reliably tell users to turn it off if it ever runs into issues (ex: huggingface/hf_transfer#30, is it immediately obviously that we need to turn off Hf transfer). Also, do we still get the same speed up if you run on a laptop vs a server?

felipemello1 · 2024-11-21T22:09:20Z

It all looks good to me. I'm just curious if you know how stable this is and if we can reliably tell users to turn it off if it ever runs into issues (ex: huggingface/hf_transfer#30, is it immediately obviously that we need to turn off Hf transfer). Also, do we still get the same speed up if you run on a laptop vs a server?

Good question! It passes our tests to check for hf_token and gated models. In their website also make it clear that it is not an experimental feature. I also tested it in multiple scenarios, e.g. lib installed/not installed, with/without flag. I think that there is enough evidence that it is safe, but we should keep an eye.

KLL535 · 2024-12-14T15:33:05Z

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

winglian · 2024-12-14T19:52:54Z

@KLL535 simply use export HF_HUB_ENABLE_HF_TRANSFER=0 from bash

KLL535 · 2024-12-14T21:40:22Z

@KLL535 simply use export HF_HUB_ENABLE_HF_TRANSFER=0 from bash

I tried, it doesn't help.
I tried pip uninstall hf-transfer. After which the program writes that it can't find hf-transfer and refuses to work further.
More about my suffering here:
cocktailpeanut/fluxgym#264
cocktailpeanut/fluxgym#46
cocktailpeanut/fluxgym#115

For now, the only way is to download the file in the browser and put it in the right places in the program

felipemello1 · 2024-12-14T23:43:22Z

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

@KLL535 at least in torchtune, this is the code:

torchtune/torchtune/__init__.py

Line 28 in c2c6f4a

try:

so if you either:

dont have _hf_transfer
set the environment variable

we wont use hf_transfer for you.

Can you share a bit of the problem that you are facing? Is it when using torchtune, or some other library?

KLL535 · 2024-12-15T00:15:30Z

How can I disable this permanently? hf transfer means I can't download any large file at all. And there is no manual way to download files via a browser.

@KLL535 at least in torchtune, this is the code:

torchtune/torchtune/__init__.py

Line 28 in c2c6f4a

try:

so if you either:

dont have _hf_transfer

set the environment variable

we wont use hf_transfer for you.

Can you share a bit of the problem that you are facing? Is it when using torchtune, or some other library?

Problem with another library fluxgym. Not only me. I posted links in the previous comment.
File 1.54G cannot be downloaded. Never. However, the file can be downloaded from the site by browser without problems.

Usually the download just freeze on a random %.
But sometimes, rarely, it can crash with this text, from which you can perhaps draw some conclusions about the components involved in the download:

pytorch_model.bin:   4%|██▍                                                        | 62.9M/1.54G [01:27<34:25, 717kB/s]
Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 523, in http_get
    hf_transfer.download(
Exception: Error while removing corrupted file:  (os error 32)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\modeling_utils.py", line 3557, in from_pretrained
    resolved_archive_file = cached_file(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\utils\hub.py", line 402, in cached_file
    resolved_file = hf_hub_download(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\utils\_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1240, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1389, in _hf_hub_download_to_cache_dir
    _download_to_tmp_and_move(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 1915, in _download_to_tmp_and_move
    http_get(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\huggingface_hub\file_download.py", line 534, in http_get
    raise RuntimeError(
RuntimeError: An error occurred while downloading using `hf_transfer`. Consider disabling HF_HUB_ENABLE_HF_TRANSFER for better error handling.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\blocks.py", line 1935, in process_api
    result = await self.call_function(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\blocks.py", line 1532, in call_function
    prediction = await utils.async_iteration(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 671, in async_iteration
    return await iterator.__anext__()
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 664, in __anext__
    return await anyio.to_thread.run_sync(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2505, in run_sync_in_worker_thread
    return await future
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 1005, in run
    result = context.run(func, *args)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 647, in run_sync_iterator_async
    return next(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\gradio\utils.py", line 809, in gen_wrapper
    response = next(iterator)
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\app.py", line 278, in run_captioning
    model = AutoModelForCausalLM.from_pretrained(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 559, in from_pretrained
    return model_class.from_pretrained(
  File "C:\webui_forge_cu121_torch21\webui\extensions\fluxgym\venv\lib\site-packages\transformers\modeling_utils.py", line 3644, in from_pretrained
    raise EnvironmentError(
OSError: Can't load the model for 'multimodalart/Florence-2-large-no-flash-attn'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'multimodalart/Florence-2-large-no-flash-attn' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Felipe Mello added 2 commits November 21, 2024 12:02

use hf transfer as default

df23534

add comment

3992c79

felipemello1 requested review from joecummings and ebsmothers November 21, 2024 20:07

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2024

RdoubleA reviewed Nov 21, 2024

View reviewed changes

RdoubleA approved these changes Nov 21, 2024

View reviewed changes

joecummings approved these changes Nov 22, 2024

View reviewed changes

joecummings changed the title ~~use hf transfer as default~~ Use hf transfer as default Nov 22, 2024

joecummings merged commit a9aadf5 into pytorch:main Nov 22, 2024
17 checks passed

ebsmothers mentioned this pull request Nov 26, 2024

v0.5.0 tracker #2008

Closed

44 tasks

KLL535 mentioned this pull request Dec 30, 2024

hf_transfer download sometimes hangs without failing or succeeding huggingface/hf_transfer#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use hf transfer as default #2046

Use hf transfer as default #2046

Uh oh!

felipemello1 commented Nov 21, 2024

Uh oh!

pytorch-bot bot commented Nov 21, 2024 •

edited

Loading

Uh oh!

RdoubleA Nov 21, 2024

Uh oh!

felipemello1 Nov 21, 2024

Uh oh!

RdoubleA commented Nov 21, 2024

Uh oh!

felipemello1 commented Nov 21, 2024

Uh oh!

Uh oh!

KLL535 commented Dec 14, 2024

Uh oh!

winglian commented Dec 14, 2024

Uh oh!

KLL535 commented Dec 14, 2024 •

edited

Loading

Uh oh!

felipemello1 commented Dec 14, 2024

Uh oh!

KLL535 commented Dec 15, 2024

Uh oh!

Uh oh!


		import hf_transfer # noqa

		if os.environ.get("HF_HUB_ENABLE_HF_TRANSFER") is None:

Use hf transfer as default #2046

Use hf transfer as default #2046

Uh oh!

Conversation

felipemello1 commented Nov 21, 2024

Context

Changelog

Test plan

Uh oh!

pytorch-bot bot commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2046

❗ 1 Active SEVs

✅ No Failures

Uh oh!

RdoubleA Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

felipemello1 Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

RdoubleA commented Nov 21, 2024

Uh oh!

felipemello1 commented Nov 21, 2024

Uh oh!

Uh oh!

KLL535 commented Dec 14, 2024

Uh oh!

winglian commented Dec 14, 2024

Uh oh!

KLL535 commented Dec 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipemello1 commented Dec 14, 2024

Uh oh!

KLL535 commented Dec 15, 2024

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 21, 2024 •

edited

Loading

KLL535 commented Dec 14, 2024 •

edited

Loading