[Bugfix][CI] Retry cached HF tokenizer load after transport failures#44820
[Bugfix][CI] Retry cached HF tokenizer load after transport failures#44820AndreasKaratzas wants to merge 6 commits into
Conversation
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
There was a problem hiding this comment.
I like the idea, but I'd like to suggest a different implementation.
# vllm/transformers_utils/repo_utils.py
@contextmanager
def retry_with_local_files_only_in_ci(
func: Callable[..., _R],
) -> Iterator[Callable[..., _R]]:
"""
Wrap a function to retry with `local_files_only=True` if it fails in CI environment.
"""
def wrapper(*args, **kwargs) -> _R:
try:
return func(*args, **kwargs)
except Exception as e:
if not os.environ.get("CI"):
raise
logger.warning(
"Call to %s failed in CI; retrying with local_files_only=True: %s",
getattr(func, "__qualname__", func),
e,
)
kwargs["local_files_only"] = True
return func(*args, **kwargs)
yield wrapperwhich would be used as follows:
with retry_with_local_files_only_in_ci(AutoTokenizer.from_pretrained) as from_pretrained
tokenizer = from_pretrained(
path_or_repo_id,
*args,
trust_remote_code=trust_remote_code,
revision=revision,
cache_dir=download_dir,
**kwargs,
)this could then:
- be reused in other places where these timeouts occur
- only have an effect in CI
|
Or even more generally: |
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
| def retry_with_kwargs_in_ci( | ||
| func: Callable[..., _R], | ||
| **retry_kwargs: Any, | ||
| ) -> Iterator[Callable[..., _R]]: |
There was a problem hiding this comment.
Why iterator? Does this not just return Callable[..., _R]?
There was a problem hiding this comment.
Yep, you're right. I removed the context manager shape and now return the wrapped callable directly.
| except Exception as e: | ||
| if not os.environ.get("CI"): | ||
| raise | ||
| if all(kwargs.get(key) == value for key, value in retry_kwargs.items()): |
There was a problem hiding this comment.
This will fail if retry_kwargs ever sets anything to None and it's not in kwargs
There was a problem hiding this comment.
Changed the check to require the key to be present before comparing, so missing keys no longer match None
There was a problem hiding this comment.
If you are going to specify the latest revision, there is no need to also pass FORCE_REDOWNLOAD, if you are forcing the latest revision and it's already there re-downloading just wastes resources/time
There was a problem hiding this comment.
Removed the cache scan and FORCE_REDOWNLOAD, you are right.
| monkeypatch.delenv("CI", raising=False) | ||
| calls = 0 | ||
|
|
||
| def failing_call(): |
There was a problem hiding this comment.
This test would fail because it doesn't accept kwargs instead of too many calls
| def failing_call(): | |
| def failing_call(**kwargs): |
There was a problem hiding this comment.
Done, simplified the test callback to accept **kwargs directly
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
I also added a feature for ROCm CI because the original cache fix only covered the MT-Bench dataset path, but the same stale-cache issue can happen for any Hugging Face model or dataset loaded without an explicit revision. The new behavior is opt-in and scoped to AMD CI: @hmellor PTAL |
There are some failed CI builds (in ROCm at least) (example: quantization test during engine startup) where the failure is:
This is not a kernel issue. Tokenizer construction made a live Hugging Face Hub metadata request and hit:
So this PR introduces a transient Hub transport failure should not fail startup if the tokenizer files are already complete in the local cache. If the cache is incomplete, startup still fails with a clear error.
cc @kenroche