Skip to content

[NPUW] Fix logic to padding VLM 3D Position Ids #31174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

GuoliangShiIntel
Copy link
Contributor

@GuoliangShiIntel GuoliangShiIntel commented Jul 2, 2025

Details:

Background:
Regular LLM uses 2D position_ids [BATCH, SEQ_LEN], Qwen2.5 VL/Omni uses 3D position_ids [3, BATCH, SEQ_LEN]
The first dimension (3) represents the three components of position encoding: time, height, and width
enabling alignment across multimodal inputs like text, audio, and video

Issue:
Currently, the position_ids data is always placed as a continuous block at the end of the position_ids_padded buffer. This approach works fine for padding shapes like [1, 500] to [1, 1024]. However, it causes issues for shapes like [3, 1, 500] to [3, 1, 1024].
image

Tickets:

@GuoliangShiIntel GuoliangShiIntel requested review from a team as code owners July 2, 2025 07:25
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jul 2, 2025
@sys-openvino-ci sys-openvino-ci added the ExternalIntelPR External contributor from Intel label Jul 2, 2025
@GuoliangShiIntel GuoliangShiIntel changed the title [NPUW] Fix VLM 3D Position Ids [NPUW] Fix logic to padding VLM 3D Position Ids Jul 2, 2025
@GuoliangShiIntel
Copy link
Contributor Author

@AsyaPronina @dmatveev Please take a look.

@GuoliangShiIntel
Copy link
Contributor Author

@TolyaTalamanov Please take a look.

@dmatveev dmatveev added this to the 2025.3 milestone Jul 3, 2025
Copy link
Contributor

@AlexanderKalistratov AlexanderKalistratov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great catch!
PR looks good, let's just add an assert

@GuoliangShiIntel GuoliangShiIntel force-pushed the sgl/fix_3d_position_id branch from 0a83da7 to 36b0e45 Compare July 4, 2025 02:44
@AlexanderKalistratov
Copy link
Contributor

build_jenkins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin ExternalIntelPR External contributor from Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants