Skip to content

Conversation

@eljandoubi
Copy link
Contributor

What does this PR do?

Fixes #36744

Models:

@github-actions github-actions bot marked this pull request as draft March 16, 2025 13:50
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@eljandoubi eljandoubi marked this pull request as ready for review March 16, 2025 13:59
Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @eljandoubi for opening a PR and fixing the issue! Just a small nit

Comment on lines 645 to 649
loss = fixed_cross_entropy(
logits.reshape(-1, self.decoder.config.vocab_size),
labels.reshape(-1),
num_items_in_batch=num_items_in_batch,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use instead, see llama for example. All reshape/view ops will happen under the hood

loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, num_items_in_batch=num_items_in_batch)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel like this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit different, there is no need to make view while passing to the function, see my example above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qubvel I've updated my code, what do you think about it now?

@qubvel qubvel merged commit e7337ee into huggingface:main Mar 20, 2025
12 checks passed
@qubvel
Copy link
Contributor

qubvel commented Mar 20, 2025

Thanks @eljandoubi

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* Pass num_items_in_batch directly to loss computation

* use self loss instead

* fix loss kwrgs

* fix vocab size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

num_items_in_batch unexpected in vision encoder decoder

3 participants