Fix XGLM loss computation (PyTorch and TensorFlow) #35878

damianoamatruda · 2025-01-24T13:28:32Z

What does this PR do?

This PR fixes the loss computation for XGLM in both PyTorch and TensorFlow implementations.

The labels were shifted by one and the padding token was appended, causing artificial loss contributions, inconsistencies between non-padded and right-padded sequences, and potential bias toward predicting padding tokens.

The updated implementations ignore the last logit and do not append the padding token to the labels, aligning with the behavior in GPT-2 and other models.

The logic of the computation was first identified in #22540, where it was ported from the PyTorch implementation to the TensorFlow one for consistency. In this PR I've reverted the TensorFlow implementation to its previous valid behavior and I've updated the PyTorch implementation to match it.

I've also added XGLM tests to ensure that the losses of non-padded and padded inputs match.

This bug was discovered in a joint project while collaborating with @mdrpanwar and @ayushkumartarun.

Who can review?

@Rocketknight1 @gante @ArthurZucker

Rocketknight1 · 2025-01-24T17:05:37Z

Hi @damianoamatruda, yes, the original code is incorrect! However, a simpler fix would be to change the label padding value to -100, which indicates masked positions for our loss computations. Padding/shifting labels is preferable to shifting logits, because logits are a much larger tensor and can carry gradient, so speed + memory usage are affected if we perform lots of operations on them, especially during training.

damianoamatruda · 2025-01-24T18:06:43Z

Hi @Rocketknight1, thank you for the clear explanation!

I've updated the PR to shift only the labels, as previously done, and replaced the padding token with the mask value -100.

I've also updated the PyTorch test to match the changes introduced in the newly merged PR #35659.

Rocketknight1

Yes, LGTM now! cc @ArthurZucker for core maintainer review

HuggingFaceDocBuilderDev · 2025-01-24T19:04:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2025-01-24T19:07:33Z

run-slow: xglm

github-actions · 2025-01-24T19:08:45Z

This comment contains run-slow, running the specified jobs: ['models/xglm'] ...

Rocketknight1 · 2025-01-24T19:45:18Z

Hi @damianoamatruda I'm seeing some failures in the slow tests for XGLM, can you take a look? You can check the CI logs, or to run slow tests locally, you can do something like RUN_SLOW=1 pytest tests/models/xglm/test_modeling_xglm.py

damianoamatruda · 2025-01-27T12:38:47Z

Hi @Rocketknight1, I took a look at the errors, which were related to XGLM and similar models but weren't connected to the loss computation. I fixed them by taking inspiration from models/t5/modeling_tf_t5.py. Could you give me your opinion?

Now, however, with the latest rebase, there are failed tests that aren't related to XGLM. Can you do something about it?

Rocketknight1 · 2025-01-27T19:20:15Z

Hi @damianoamatruda, I'm not sure exactly what's causing that! It's likely those tests were just flaky on a past commit - can you try rebasing again? If they still won't go away then I'll see if we can actually fix or skip them on main.

damianoamatruda · 2025-01-27T19:37:58Z

@Rocketknight1, I rebased and the test tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py::Qwen2_5_VLModelTest::test_generate_from_inputs_embeds_1_beam_search failed again.

Rocketknight1 · 2025-01-28T15:52:17Z

Yeah, that test is a problem on main right now. Please leave it a couple of days and then check back in - hopefully we'll have resolved it by now!

Rocketknight1 · 2025-01-28T16:06:03Z

Tests are finally green! Pinging @Cyrilvallez for core maintainer review

damianoamatruda · 2025-01-28T16:28:18Z

Great!

Cyrilvallez

Hey! All LGTM concerning the loss part!

However, I must say that I am skeptical concerning the change in set/get embeddings. It looks like we are changing input type to the functions here (layer vs underlying layer data), which may be breaking for current code. Moreover, the failing test explicitly states that it is expected to fail (and it was never fixed).
TLDR I'd rather we revert the part on embeddings, and keep the loss part 🤗

damianoamatruda · 2025-01-31T18:47:10Z

Hi @Cyrilvallez, done!

The tests now pass without requiring the commits for the embeddings. How did you fix/disable the failing tests?

Thank you for the review.

Cyrilvallez · 2025-02-04T10:24:30Z

Thanks for reverting! The failing tests were slow test triggered by the github-actions, they are not run by the usual CI which is why you cannot see them now!

damianoamatruda · 2025-02-04T19:31:44Z

Is there anything else to do or is everything okay?

ArthurZucker

Let's go! @Cyrilvallez has a conference this week hahah sorry 🤗

ArthurZucker · 2025-02-13T08:41:19Z

BTW you need to resolve confilcts (probably no changes needed on the modeling non tf side no?

Cyrilvallez

Hey @damianoamatruda! Indeed very sorry, I had a lot going on this week! As you can see, in the meantime xglm got the loss refactor incorporated, which automatically fixed the issue at hand in Pytorch. The change to the pytorch modeling should not be needed anymore. Very happy to add the test and the change to tensorflow file though!

src/transformers/models/xglm/modeling_xglm.py

tests/models/xglm/test_modeling_xglm.py

This updates the expected output string of test_xglm_sample for torch 2.0 to the correct one and removes the one for torch 1.13.1 + cu116 (transformers moved to torch 2.0 with PR #35358).

damianoamatruda · 2025-02-14T17:45:52Z

@ArthurZucker, @Cyrilvallez, no problem, I understand your commitments 🤗

damianoamatruda · 2025-02-14T17:48:39Z

Refactor #35875 moved the loss computation for XGLM into a dedicated function xglm_cross_entropy_loss, but the invalid logic remains: it still appends the padding token to the labels, causing this PR's test XGLMModelLanguageGenerationTest::test_loss_with_padding to fail. Removing the dedicated function fixes the issue.

Cyrilvallez

Oh indeed, did not notice that #35875 created a dedicated function to ensure BC, but BC was wrong!

All right, LGTM! Thanks a lot for the fix!! 🤗

damianoamatruda · 2025-02-18T17:21:56Z

Thank you all, it's been a pleasure for me! 🤗

Rocketknight1 approved these changes Jan 24, 2025

View reviewed changes

Rocketknight1 added the run-slow label Jan 24, 2025

Rocketknight1 removed the run-slow label Jan 24, 2025

Cyrilvallez reviewed Jan 29, 2025

View reviewed changes

damianoamatruda requested a review from Cyrilvallez February 10, 2025 17:00

ArthurZucker approved these changes Feb 13, 2025

View reviewed changes

Cyrilvallez reviewed Feb 14, 2025

View reviewed changes

src/transformers/models/xglm/modeling_xglm.py Outdated Show resolved Hide resolved

tests/models/xglm/test_modeling_xglm.py Show resolved Hide resolved

damianoamatruda added 3 commits February 14, 2025 18:37

Fix XGLM loss computation (PyTorch and TensorFlow)

363436c

Update expected output string in XGLM sample test

c8fae60

This updates the expected output string of test_xglm_sample for torch 2.0 to the correct one and removes the one for torch 1.13.1 + cu116 (transformers moved to torch 2.0 with PR #35358).

Update expected output IDs in XGLM generation test

aee1fd0

damianoamatruda requested a review from Cyrilvallez February 17, 2025 15:35

Cyrilvallez approved these changes Feb 18, 2025

View reviewed changes

Cyrilvallez merged commit 4d2de5f into huggingface:main Feb 18, 2025
16 checks passed

damianoamatruda deleted the fix-xglm-loss branch February 18, 2025 17:22

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Fix XGLM loss computation (PyTorch and TensorFlow) #35878

Uh oh!

Conversation

damianoamatruda commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

Rocketknight1 commented Jan 24, 2025

Uh oh!

damianoamatruda commented Jan 24, 2025

Uh oh!

Rocketknight1 left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Jan 24, 2025

Uh oh!

Rocketknight1 commented Jan 24, 2025

Uh oh!

github-actions bot commented Jan 24, 2025

Uh oh!

Rocketknight1 commented Jan 24, 2025

Uh oh!

damianoamatruda commented Jan 27, 2025

Uh oh!

Rocketknight1 commented Jan 27, 2025

Uh oh!

damianoamatruda commented Jan 27, 2025

Uh oh!

Rocketknight1 commented Jan 28, 2025

Uh oh!

Rocketknight1 commented Jan 28, 2025

Uh oh!

damianoamatruda commented Jan 28, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

damianoamatruda commented Jan 31, 2025

Uh oh!

Cyrilvallez commented Feb 4, 2025

Uh oh!

damianoamatruda commented Feb 4, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Feb 13, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

damianoamatruda commented Feb 14, 2025

Uh oh!

damianoamatruda commented Feb 14, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

damianoamatruda commented Feb 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

damianoamatruda commented Jan 24, 2025 •

edited

Loading