-
Notifications
You must be signed in to change notification settings - Fork 29.8k
Description
System Info
transformers
version: 4.52.3- Platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.39
- Python version: 3.10.16
- Huggingface_hub version: 0.32.1
- Safetensors version: 0.5.3
- Accelerate version: 1.7.0
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (GPU?): 2.6.0+cu118 (True)
- Tensorflow version (GPU?): 2.19.0 (True)
- Flax version (CPU?/GPU?/TPU?): 0.10.6 (cpu)
- Jax version: 0.6.0
- JaxLib version: 0.6.0
- Using distributed or parallel set-up in script?:
- Using GPU in script?: yes
- GPU type: NVIDIA RTX 6000 Ada Generation
Who can help?
@ArthurZucker and @itazap
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
Summary:
When using PaliGemmaProcessor for multimodal fine-tuning with a suffix, the processor crashes with:
AttributeError: ‘list’ object has no attribute ‘masked_fill’
This happens because return_tensors="pt" is not passed to the tokenizer internally. As a result, the tokenizer returns Python lists for input_ids and token_type_ids, and the processor assumes they’re tensors — leading to a crash at:
`inputs[“input_ids”].masked_fill(inputs[“token_type_ids”] == 0, -100)
example:
from transformers import PaliGemmaForConditionalGeneration, PaliGemmaProcessor
model_id = 'google/paligemma2-3b-pt-224'
processor = PaliGemmaProcessor.from_pretrained(model_id)
examples = [
{
"prefix": "caption <loc0412><loc0269><loc0644><loc0546><seg015>",
"suffix": "RML",
"image": PIL.Image.new("RGB", (224, 224)),
},
{
"prefix": "detect Left Fourth Rib",
"suffix": "<loc0234><loc0621><loc0495><loc0796> Left Fourth Rib",
"image": PIL.Image.new("RGB", (224, 224)),
}
]
texts = ["<image>" + ex["prefix"] for ex in examples]
labels = [ex["suffix"] for ex in examples]
images = [ex["image"] for ex in examples]
tokens = processor(
text=texts,
images=images,
suffix=labels,
return_tensors="pt",
padding="longest"
)
This raises:
AttributeError: ‘list’ object has no attribute ‘masked_fill’
Proposed Fix:
In the call method of PaliGemmaProcessor, the return_tensors argument is popped from text_kwargs:
return_tensors = output_kwargs["text_kwargs"].pop("return_tensors", None)
…but it is never passed to self.tokenizer(...). Adding this line to the tokenizer call may fix the issue:
return_tensors=return_tensors,
Expected behavior
The processor should correctly pass return_tensors="pt" to the tokenizer so that all fields (e.g., input_ids, token_type_ids) are returned as PyTorch tensors, allowing downstream tensor operations like .masked_fill() to work without errors.