[Bugfix][CI] Retry cached HF tokenizer load after transport failures by AndreasKaratzas · Pull Request #44820 · vllm-project/vllm

AndreasKaratzas · 2026-06-08T02:33:49Z

There are some failed CI builds (in ROCm at least) (example: quantization test during engine startup) where the failure is:

tests/kernels/quantization/test_triton_scaled_mm.py::test_rocm_compressed_tensors_w8a8[10-32-neuralmagic/Llama-3.2-1B-quantized.w8a8]

This is not a kernel issue. Tokenizer construction made a live Hugging Face Hub metadata request and hit:

httpx.RemoteProtocolError: Server disconnected without sending a response.

So this PR introduces a transient Hub transport failure should not fail startup if the tokenizer files are already complete in the local cache. If the cache is incomplete, startup still fails with a clear error.

cc @kenroche

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

hmellor

I like the idea, but I'd like to suggest a different implementation.

# vllm/transformers_utils/repo_utils.py

@contextmanager
def retry_with_local_files_only_in_ci(
    func: Callable[..., _R],
) -> Iterator[Callable[..., _R]]:
    """
    Wrap a function to retry with `local_files_only=True` if it fails in CI environment.
    """

    def wrapper(*args, **kwargs) -> _R:
        try:
            return func(*args, **kwargs)
        except Exception as e:
            if not os.environ.get("CI"):
                raise
            logger.warning(
                "Call to %s failed in CI; retrying with local_files_only=True: %s",
                getattr(func, "__qualname__", func),
                e,
            )
            kwargs["local_files_only"] = True
            return func(*args, **kwargs)

    yield wrapper

which would be used as follows:

            with retry_with_local_files_only_in_ci(AutoTokenizer.from_pretrained) as from_pretrained
                tokenizer = from_pretrained(
                    path_or_repo_id,
                    *args,
                    trust_remote_code=trust_remote_code,
                    revision=revision,
                    cache_dir=download_dir,
                    **kwargs,
                )

this could then:

be reused in other places where these timeouts occur
only have an effect in CI

hmellor · 2026-06-09T16:00:49Z

Or even more generally: retry_with_kwargs_in_ci so that we can use it for various interfaces which may use different kwargs for offline mode

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-06-11T04:18:41Z

Added a reusable retry_with_kwargs_in_ci helper in repo_utils.py.
Use that in HF tokenizer loading so CI retries once with local_files_only=True after a failed Hub call.
Make MT-Bench opt into checking the latest HF dataset revision and force a redownload when that revision is not present in the local dataset cache. (adjacent problem in CI)
cc @hmellor

hmellor · 2026-06-19T13:01:21Z

+def retry_with_kwargs_in_ci(
+    func: Callable[..., _R],
+    **retry_kwargs: Any,
+) -> Iterator[Callable[..., _R]]:


Why iterator? Does this not just return Callable[..., _R]?

Yep, you're right. I removed the context manager shape and now return the wrapped callable directly.

hmellor · 2026-06-19T13:03:29Z

+        except Exception as e:
+            if not os.environ.get("CI"):
+                raise
+            if all(kwargs.get(key) == value for key, value in retry_kwargs.items()):


This will fail if retry_kwargs ever sets anything to None and it's not in kwargs

Changed the check to require the key to be present before comparing, so missing keys no longer match None

hmellor

~~Datasets changes seem unrelated?~~

I hadn't read your comment before reviewing

hmellor · 2026-06-19T13:11:23Z

If you are going to specify the latest revision, there is no need to also pass FORCE_REDOWNLOAD, if you are forcing the latest revision and it's already there re-downloading just wastes resources/time

Removed the cache scan and FORCE_REDOWNLOAD, you are right.

hmellor · 2026-06-19T13:15:31Z

+    monkeypatch.delenv("CI", raising=False)
+    calls = 0
+
+    def failing_call():


This test would fail because it doesn't accept kwargs instead of too many calls

Suggested change

def failing_call():

def failing_call(**kwargs):

Done, simplified the test callback to accept **kwargs directly

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-06-20T03:28:02Z

I also added a feature for ROCm CI because the original cache fix only covered the MT-Bench dataset path, but the same stale-cache issue can happen for any Hugging Face model or dataset loaded without an explicit revision. The new behavior is opt-in and scoped to AMD CI: run-amd-test.sh enables VLLM_CI_ENSURE_LATEST_HF_REVISION and passes it into the Docker container, while the default remains off in envs.py. The actual revision resolution is in maybe_resolve_latest_hf_revision() in repo_utils.py, and the model path uses it from ModelConfig; dataset loading reuses the same helper. Explicit revisions are preserved, local/offline paths are skipped, and access/not-found Hub errors are not treated as transient cache fallback cases.

@hmellor PTAL

Retry cached HF tokenizer load after transport failures

2f26a89

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas marked this pull request as ready for review June 8, 2026 02:33

AndreasKaratzas requested review from DarkLight1337 and njhill as code owners June 8, 2026 02:33

claude Bot reviewed Jun 8, 2026

View reviewed changes

mergify Bot added the bug Something isn't working label Jun 8, 2026

DarkLight1337 requested a review from hmellor June 8, 2026 04:05

hmellor requested changes Jun 9, 2026

View reviewed changes

AndreasKaratzas added 2 commits June 10, 2026 23:14

Merge remote-tracking branch 'origin/main' into akaratza_retry_hf

f5bd975

[Bugfix][CI] Reuse cached Hugging Face assets after CI fetch failures

f8de820

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mergify Bot added the performance Performance-related issues label Jun 11, 2026

hmellor reviewed Jun 19, 2026

View reviewed changes

AndreasKaratzas added 2 commits June 19, 2026 18:36

Merge remote-tracking branch 'origin/main' into akaratza_retry_hf

bd83f28

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[ROCm][CI] Address HF retry helper and dataset revision feedback

a575d46

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

depthfirst-app Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread vllm/transformers_utils/repo_utils.py

[ROCm][CI] Resolve floating HF revisions

4585545

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from Harry-Chen, WoosukKwon, houseroad, khluu, mgoin, robertgshaw2-redhat, tlrmchlsmth and youkaichao as code owners June 20, 2026 03:26

AndreasKaratzas requested review from ProExpertProg and yewentao256 as code owners June 20, 2026 03:26

mergify Bot added the ci/build label Jun 20, 2026

Uh oh!

Conversation

AndreasKaratzas commented Jun 8, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

hmellor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor commented Jun 9, 2026

Uh oh!

AndreasKaratzas commented Jun 11, 2026

Uh oh!

hmellor Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

hmellor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

hmellor Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AndreasKaratzas commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hmellor left a comment •

edited

Loading

hmellor Jun 19, 2026 •

edited

Loading

hmellor left a comment •

edited

Loading