Skip to content

[BUG] OmDet-Turbo processor produces 640px inputs but the model expects 224px #44610

@harshaljanjani

Description

@harshaljanjani

System Info

  • transformers version: 5.0.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • huggingface_hub version: 1.3.2
  • safetensors version: 0.7.0
  • accelerate version: 1.12.0
  • Accelerate config: not installed
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
  • GPU type: NVIDIA L4
  • NVIDIA driver version: 550.90.07
  • CUDA version: 12.4

Who can help?

@zucchini-nlp (🚨 Delete duplicate code in backbone utils)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

Current Repro Output:

Image

OmDet-Turbo fails with AssertionError in inference. The processor produces 640×640 images but the model expects an input height of 224, and running the official loading and inference code raises AssertionError: Input height (640) doesn't match model (224) as shown in the screenshot; instead of the expected output tensor. Also causes the issue in the official OmDet-Turbo CI run.

Expected behavior

outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions