Several fixes for Gemma3n #39135

Cyrilvallez · 2025-06-30T17:25:46Z

What does this PR do?

HuggingFaceDocBuilderDev · 2025-06-30T17:41:00Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks 🤗

ArthurZucker · 2025-07-01T08:11:53Z

src/transformers/models/gemma3n/modeling_gemma3n.py

-        target_magnitude: torch.Tensor = torch.mean(hidden_states_0**2, dim=-1, keepdim=True) ** 0.5
-        epsilon_tensor = torch.tensor(torch.finfo().min)
+        target_magnitude = torch.mean(hidden_states_0**2, dim=-1, keepdim=True) ** 0.5
+        epsilon_tensor = torch.tensor(1e-5)


Seems a little bit unrelated to me, where does that come from?

Otherwise we can get NaN on those layers because of the line current_hidden_state = current_hidden_state * (target_magnitude / torch.maximum(new_magnitude, epsilon_tensor)) - and the max ops is completely useless if we compare it to the minimum possible for any given dtype. Given the epsilon name of the variable, I assumed it is a typo and was supposed to be a small value instead for numerical stability

ydshieh · 2025-07-01T08:21:09Z

run-slow: gemma3n

github-actions · 2025-07-01T08:22:50Z

This comment contains run-slow, running the specified jobs:

models: ['models/gemma3n']
quantizations: [] ...

Cyrilvallez · 2025-07-01T08:34:48Z

All good, @ydshieh for the slow tests! (IntegrationTests are still skipped, I will check them soon)

danielhanchen · 2025-07-01T11:48:31Z

Do you guys know why the training loss is exceptionally high? I don't think it's due to the gradient accumulation - it does quickly decrease, but it's very weird

ArthurZucker · 2025-07-01T11:49:35Z

👀 no idea!

danielhanchen · 2025-07-01T12:15:58Z

Wait I mis-spoke - grad accumulation does in fact not work correctly.

But the losses are still suspiciously high

* remove the skips * fix the epsilon to a small value (does not make sense otherwise) * safeguard * overload test_eager_matches_sdpa * Update test_modeling_common.py * skip appropriate tests * correct no_split_layer * fix all devices issue * fix backward * fix

Cyrilvallez added 10 commits June 27, 2025 14:37

remove the skips

514abbb

fix the epsilon to a small value (does not make sense otherwise)

eb2789f

safeguard

3afc557

overload test_eager_matches_sdpa

273985b

Update test_modeling_common.py

e398c75

skip appropriate tests

f3ad228

correct no_split_layer

d3a004c

fix all devices issue

a6a00fe

fix backward

9aecb76

fix

04b8f78

ArthurZucker approved these changes Jul 1, 2025

View reviewed changes

Cyrilvallez merged commit dbc9832 into main Jul 1, 2025
22 checks passed

Cyrilvallez deleted the fix-gemma3n-tests branch July 1, 2025 08:34

Cyrilvallez added the for patch Tag issues / labels that should be included in the next patch label Jul 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Several fixes for Gemma3n #39135

Several fixes for Gemma3n #39135

Uh oh!

Cyrilvallez commented Jun 30, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jun 30, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Jul 1, 2025

Uh oh!

Cyrilvallez Jul 1, 2025 •

edited

Loading

Uh oh!

ydshieh commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

Cyrilvallez commented Jul 1, 2025

Uh oh!

Uh oh!

danielhanchen commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

danielhanchen commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Several fixes for Gemma3n #39135

Several fixes for Gemma3n #39135

Uh oh!

Conversation

Cyrilvallez commented Jun 30, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jun 30, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

Cyrilvallez commented Jul 1, 2025

Uh oh!

Uh oh!

danielhanchen commented Jul 1, 2025

Uh oh!

ArthurZucker commented Jul 1, 2025

Uh oh!

danielhanchen commented Jul 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cyrilvallez Jul 1, 2025 •

edited

Loading