-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Open
Labels
Description
System Info
transformersversion:5.0.0.dev0- Platform:
Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39 - Python version:
3.12.3 huggingface_hubversion:1.3.2safetensorsversion:0.7.0accelerateversion:1.12.0- Accelerate config:
not installed - DeepSpeed version:
not installed - PyTorch version (accelerator?):
2.9.1+cu128 (CUDA) - GPU type:
NVIDIA L4 - NVIDIA driver version:
550.90.07 - CUDA version:
12.4
Who can help?
@zucchini-nlp (🚨 Delete duplicate code in backbone utils)
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch
model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
with torch.no_grad():
outputs = model(**encoding)
print(outputs.decoder_coord_logits.shape)
except Exception as e:
print(e)Current Repro Output:
OmDet-Turbo fails with AssertionError in inference. The processor produces 640×640 images but the model expects an input height of 224, and running the official loading and inference code raises AssertionError: Input height (640) doesn't match model (224) as shown in the screenshot; instead of the expected output tensor. Also causes the issue in the official OmDet-Turbo CI run.
Expected behavior
→ outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.
Reactions are currently unavailable
