[tests] Smaller model in slow cache tests #37922

gante · 2025-05-02T09:31:29Z

What does this PR do?

Our CI is failing due to OOM in some slow tests (see CacheHardIntegrationTest failures here).

This PR replaces the 7B model (requires ~15GB VRAM) with a 4B model (requires ~9GB VRAM) in tests that were using a 7B model. It also makes a few more minor modifications to ensure a green CI (commented in the diff)

github-actions · 2025-05-02T09:31:41Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

gante · 2025-05-02T09:33:58Z

tests/utils/test_cache_utils.py

+    def setUp(self):
+        # Clears memory before each test. Some tests use large models, which might result in suboptimal torch
+        # re-allocation if we run multiple tests in a row without clearing memory.
+        cleanup(torch_device, gc_collect=True)
+
+    @classmethod
+    def tearDownClass(cls):
+        # Clears memory after the last test. See `setUp` for more details.


Related to this fix.

On main, we're not reusing GPU memory properly across from_pretrained calls. Until it is sorted, to prevent flaky tests, we have to start each test with a memory reset. If we don't start with a memory reset, the first test in this class might have memory-related issues (see diff in test_cache_copy, the first test run in this class)

gante · 2025-05-02T09:36:26Z

tests/utils/test_cache_utils.py

+            "enriching experience that broadens our horizons and allows us to explore the world beyond our comfort "
+            "zones. Whether it's a short weekend getaway",


Depending on the device, if we run this test in isolation (RUN_SLOW=1 py.test tests/utils/test_cache_utils.py -k test_cache_copy) we might get a different output compared to a full test suite run. With the updated memory reset, this is the correct output in all combinations -- see comment above

HuggingFaceDocBuilderDev · 2025-05-02T09:44:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh

Thank you!

ydshieh · 2025-05-05T15:10:55Z

it's nice to try to use run-slow: utils maybe?

gante · 2025-05-06T09:27:51Z

run-slow: utils

github-actions · 2025-05-06T09:29:06Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['utils']
quantizations: [] ...

gante · 2025-05-06T09:57:17Z

(there are some multi-gpu issues, fixing them)

gante · 2025-05-06T10:14:13Z

@ydshieh the multi-gpu issues require extensive changes on the offloaded caches, I'm merging this as-is and I'll open a new PR to make multi-gpu work!

gante requested a review from ydshieh May 2, 2025 09:31

github-actions bot marked this pull request as draft May 2, 2025 09:31

gante marked this pull request as ready for review May 2, 2025 09:31

gante commented May 2, 2025

View reviewed changes

ydshieh approved these changes May 5, 2025

View reviewed changes

gante mentioned this pull request May 6, 2025

[CI] remove duplicated message on GH comment to run slow tests #37970

Merged

gante added 4 commits May 6, 2025 09:54

use a 4b model

db0fa68

fix tests

e461c17

test needs read token

5dccd76

make fixup

899cb26

gante force-pushed the smaller_model_cache_tests branch from 394e83f to 899cb26 Compare May 6, 2025 09:54

gante merged commit 9981214 into huggingface:main May 6, 2025
11 checks passed

gante deleted the smaller_model_cache_tests branch May 6, 2025 10:15

gante mentioned this pull request May 6, 2025

[caches] Raise exception on offloaded static caches + multi device #37974

Merged

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

[tests] Smaller model in slow cache tests (huggingface#37922)

6dd2514

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tests] Smaller model in slow cache tests #37922

[tests] Smaller model in slow cache tests #37922

Uh oh!

gante commented May 2, 2025

Uh oh!

github-actions bot commented May 2, 2025

Uh oh!

gante May 2, 2025

Uh oh!

gante May 2, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented May 2, 2025

Uh oh!

ydshieh left a comment

Uh oh!

ydshieh commented May 5, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"enriching experience that broadens our horizons and allows us to explore the world beyond our comfort "
		"zones. Whether it's a short weekend getaway",

[tests] Smaller model in slow cache tests #37922

[tests] Smaller model in slow cache tests #37922

Uh oh!

Conversation

gante commented May 2, 2025

What does this PR do?

Uh oh!

github-actions bot commented May 2, 2025

Uh oh!

gante May 2, 2025

Choose a reason for hiding this comment

Uh oh!

gante May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 2, 2025

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

ydshieh commented May 5, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

gante commented May 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gante May 2, 2025 •

edited

Loading