Mllama: raise better error #35934

zucchini-nlp · 2025-01-28T12:35:32Z

What does this PR do?

Passing images as a flat list do not give same logits in mllama, as passing the images in nested batch format. To get the model working correctly, one should pass as many images per batch, as there are image tokens.

See below reproducer

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
texts = ["<|image|><|begin_of_text|>What do you see here?", "<|image|><|begin_of_text|>What do you see here but longer?"]

repo_id = "mv11/11"
processor = AutoProcessor.from_pretrained(repo_id)
model = MllamaForConditionalGeneration.from_pretrained(repo_id, device_map='auto')

batch = processor(text=texts, images=[image, image], return_tensors="pt", padding=True) # .to(model.device)
with torch.no_grad():
    model_output = model(
        input_ids = batch['input_ids'],
        attention_mask = batch['attention_mask'], 
        pixel_values = batch['pixel_values'], 
        aspect_ratio_ids = batch['aspect_ratio_ids'], 
        aspect_ratio_mask = batch['aspect_ratio_mask'], 
        cross_attention_mask = batch['cross_attention_mask'],
        )


batch = processor(text=texts, images=[[image], [image]], return_tensors="pt", padding=True) # .to(model.device)
with torch.no_grad():
    model_output_2 = model(
        input_ids = batch['input_ids'],
        attention_mask = batch['attention_mask'], 
        pixel_values = batch['pixel_values'], 
        aspect_ratio_ids = batch['aspect_ratio_ids'], 
        aspect_ratio_mask = batch['aspect_ratio_mask'], 
        cross_attention_mask = batch['cross_attention_mask'],
        )

print(torch.allclose(model_output_2.logits, model_output.logits))
>>> False

HuggingFaceDocBuilderDev · 2025-01-28T13:01:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Okay! I think the tests need an update no?
Other lgtm

zucchini-nlp · 2025-02-13T17:07:39Z

@ArthurZucker yeah, updated the test and added check for this condition. We might need to update idefics models also, but I will do it later in another PR

ArthurZucker

Thanks

* fix mllama * update test * fix test

fix mllama

2091e3b

zucchini-nlp requested a review from ArthurZucker January 28, 2025 12:59

ArthurZucker reviewed Feb 12, 2025

View reviewed changes

zucchini-nlp added 2 commits February 13, 2025 17:56

Merge remote-tracking branch 'upstream/main' into mllama

eab64a0

update test

ecb6730

fix test

edffd70

zucchini-nlp requested a review from ArthurZucker February 14, 2025 09:25

Merge branch 'main' into mllama

38c576e

ArthurZucker approved these changes Mar 20, 2025

View reviewed changes

zucchini-nlp merged commit 97d2f9d into huggingface:main Mar 21, 2025
11 checks passed

zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

Mllama: raise better error (huggingface#35934)

1d93492

* fix mllama * update test * fix test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mllama: raise better error #35934

Mllama: raise better error #35934

Uh oh!

zucchini-nlp commented Jan 28, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jan 28, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

zucchini-nlp commented Feb 13, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Mllama: raise better error #35934

Mllama: raise better error #35934

Uh oh!

Conversation

zucchini-nlp commented Jan 28, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jan 28, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Feb 13, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants