Skip to content

[Bugfix][NPU] Add _model_forward for ModelRunner#505

Merged
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
gcanlin:bugfix-npu-12
Dec 27, 2025
Merged

[Bugfix][NPU] Add _model_forward for ModelRunner#505
hsliuustc0106 merged 2 commits intovllm-project:mainfrom
gcanlin:bugfix-npu-12

Conversation

@gcanlin
Copy link
Collaborator

@gcanlin gcanlin commented Dec 27, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

python end2end.py --output-wav output_audio                   --query-type use_audio
python end2end.py --output-wav output_audio                   --query-type use_video --video-path ./sample_demo_3.mp4
python end2end.py --output-wav output_audio                   --query-type use_image

Test Result

Input audio:

INFO 12-27 15:20:34 [log_utils.py:529] {'type': 'request_level_metrics',
INFO 12-27 15:20:34 [log_utils.py:529]  'request_id': '0_6b56bae8-0186-40e6-9f89-dbcaa48b7708',
INFO 12-27 15:20:34 [log_utils.py:529]  'e2e_time_ms': 63273.81873130798,
INFO 12-27 15:20:34 [log_utils.py:529]  'e2e_tpt': 0.0,
INFO 12-27 15:20:34 [log_utils.py:529]  'num_tokens_out': 0,
INFO 12-27 15:20:34 [log_utils.py:529]  'transfers_total_time_ms': 24.482011795043945,
INFO 12-27 15:20:34 [log_utils.py:529]  'transfers_total_bytes': 9166168,
INFO 12-27 15:20:34 [log_utils.py:529]  'stages': {0: {'stage_gen_time_ms': 14170.833110809326, 'num_tokens_out': 68},
INFO 12-27 15:20:34 [log_utils.py:529]             1: {'stage_gen_time_ms': 26174.392223358154, 'num_tokens_out': 214},
INFO 12-27 15:20:34 [log_utils.py:529]             2: {'stage_gen_time_ms': 21758.036613464355, 'num_tokens_out': 0}}}
INFO 12-27 15:20:34 [omni.py:542] [Summary] {'e2e_requests': 1,
INFO 12-27 15:20:34 [omni.py:542]  'e2e_total_time_ms': 63274.08218383789,
INFO 12-27 15:20:34 [omni.py:542]  'e2e_sum_time_ms': 63273.81873130798,
INFO 12-27 15:20:34 [omni.py:542]  'e2e_total_tokens': 0,
INFO 12-27 15:20:34 [omni.py:542]  'e2e_avg_time_per_request_ms': 63273.81873130798,
INFO 12-27 15:20:34 [omni.py:542]  'e2e_avg_tokens_per_s': 0.0,
INFO 12-27 15:20:34 [omni.py:542]  'wall_time_ms': 63274.08218383789,
INFO 12-27 15:20:34 [omni.py:542]  'final_stage_id': {'0_6b56bae8-0186-40e6-9f89-dbcaa48b7708': 2},
INFO 12-27 15:20:34 [omni.py:542]  'stages': [{'stage_id': 0,
INFO 12-27 15:20:34 [omni.py:542]              'requests': 1,
INFO 12-27 15:20:34 [omni.py:542]              'tokens': 68,
INFO 12-27 15:20:34 [omni.py:542]              'total_time_ms': 14204.21814918518,
INFO 12-27 15:20:34 [omni.py:542]              'avg_time_per_request_ms': 14204.21814918518,
INFO 12-27 15:20:34 [omni.py:542]              'avg_tokens_per_s': 4.787310310627748},
INFO 12-27 15:20:34 [omni.py:542]             {'stage_id': 1,
INFO 12-27 15:20:34 [omni.py:542]              'requests': 1,
INFO 12-27 15:20:34 [omni.py:542]              'tokens': 214,
INFO 12-27 15:20:34 [omni.py:542]              'total_time_ms': 26183.300256729126,
INFO 12-27 15:20:34 [omni.py:542]              'avg_time_per_request_ms': 26183.300256729126,
INFO 12-27 15:20:34 [omni.py:542]              'avg_tokens_per_s': 8.173148453468995},
INFO 12-27 15:20:34 [omni.py:542]             {'stage_id': 2,
INFO 12-27 15:20:34 [omni.py:542]              'requests': 1,
INFO 12-27 15:20:34 [omni.py:542]              'tokens': 0,
INFO 12-27 15:20:34 [omni.py:542]              'total_time_ms': 21768.054485321045,
INFO 12-27 15:20:34 [omni.py:542]              'avg_time_per_request_ms': 21768.054485321045,
INFO 12-27 15:20:34 [omni.py:542]              'avg_tokens_per_s': 0.0}],
INFO 12-27 15:20:34 [omni.py:542]  'transfers': [{'from_stage': 0,
INFO 12-27 15:20:34 [omni.py:542]                 'to_stage': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'total_bytes': 8135315,
INFO 12-27 15:20:34 [omni.py:542]                 'total_time_ms': 14.225482940673828,
INFO 12-27 15:20:34 [omni.py:542]                 'tx_mbps': 4575.065765529447,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_total_bytes': 8135315,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_total_time_ms': 4.580259323120117,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_mbps': 14209.352660771434,
INFO 12-27 15:20:34 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'total_transfer_time_ms': 19.59538459777832,
INFO 12-27 15:20:34 [omni.py:542]                 'total_mbps': 3321.318837874655},
INFO 12-27 15:20:34 [omni.py:542]                {'from_stage': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'to_stage': 2,
INFO 12-27 15:20:34 [omni.py:542]                 'samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'total_bytes': 1030853,
INFO 12-27 15:20:34 [omni.py:542]                 'total_time_ms': 1.1110305786132812,
INFO 12-27 15:20:34 [omni.py:542]                 'tx_mbps': 7422.679590235193,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_total_bytes': 1030853,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_total_time_ms': 3.039121627807617,
INFO 12-27 15:20:34 [omni.py:542]                 'rx_mbps': 2713.555102415941,
INFO 12-27 15:20:34 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:20:34 [omni.py:542]                 'total_transfer_time_ms': 4.886627197265625,
INFO 12-27 15:20:34 [omni.py:542]                 'total_mbps': 1687.6310934082749}]}
Request ID: 0_6b56bae8-0186-40e6-9f89-dbcaa48b7708, Text saved to output_audio/0_6b56bae8-0186-40e6-9f89-dbcaa48b7708.txt
Request ID: 0_6b56bae8-0186-40e6-9f89-dbcaa48b7708, Saved audio to output_audio/output_0_6b56bae8-0186-40e6-9f89-dbcaa48b7708.wav

Input video

INFO 12-27 15:47:02 [log_utils.py:529] {'type': 'request_level_metrics',
INFO 12-27 15:47:02 [log_utils.py:529]  'request_id': '0_b2f5b526-42af-4b62-90b0-b8c36d224097',
INFO 12-27 15:47:02 [log_utils.py:529]  'e2e_time_ms': 65112.905740737915,
INFO 12-27 15:47:02 [log_utils.py:529]  'e2e_tpt': 0.0,
INFO 12-27 15:47:02 [log_utils.py:529]  'num_tokens_out': 0,
INFO 12-27 15:47:02 [log_utils.py:529]  'transfers_total_time_ms': 96.62437438964844,
INFO 12-27 15:47:02 [log_utils.py:529]  'transfers_total_bytes': 54510818,
INFO 12-27 15:47:02 [log_utils.py:529]  'stages': {0: {'stage_gen_time_ms': 9260.12921333313, 'num_tokens_out': 64},
INFO 12-27 15:47:02 [log_utils.py:529]             1: {'stage_gen_time_ms': 32654.93869781494, 'num_tokens_out': 279},
INFO 12-27 15:47:02 [log_utils.py:529]             2: {'stage_gen_time_ms': 21932.645082473755, 'num_tokens_out': 0}}}
INFO 12-27 15:47:02 [omni.py:542] [Summary] {'e2e_requests': 1,
INFO 12-27 15:47:02 [omni.py:542]  'e2e_total_time_ms': 65113.1808757782,
INFO 12-27 15:47:02 [omni.py:542]  'e2e_sum_time_ms': 65112.905740737915,
INFO 12-27 15:47:02 [omni.py:542]  'e2e_total_tokens': 0,
INFO 12-27 15:47:02 [omni.py:542]  'e2e_avg_time_per_request_ms': 65112.905740737915,
INFO 12-27 15:47:02 [omni.py:542]  'e2e_avg_tokens_per_s': 0.0,
INFO 12-27 15:47:02 [omni.py:542]  'wall_time_ms': 65113.1808757782,
INFO 12-27 15:47:02 [omni.py:542]  'final_stage_id': {'0_b2f5b526-42af-4b62-90b0-b8c36d224097': 2},
INFO 12-27 15:47:02 [omni.py:542]  'stages': [{'stage_id': 0,
INFO 12-27 15:47:02 [omni.py:542]              'requests': 1,
INFO 12-27 15:47:02 [omni.py:542]              'tokens': 64,
INFO 12-27 15:47:02 [omni.py:542]              'total_time_ms': 9372.593641281128,
INFO 12-27 15:47:02 [omni.py:542]              'avg_time_per_request_ms': 9372.593641281128,
INFO 12-27 15:47:02 [omni.py:542]              'avg_tokens_per_s': 6.8284193734928555},
INFO 12-27 15:47:02 [omni.py:542]             {'stage_id': 1,
INFO 12-27 15:47:02 [omni.py:542]              'requests': 1,
INFO 12-27 15:47:02 [omni.py:542]              'tokens': 279,
INFO 12-27 15:47:02 [omni.py:542]              'total_time_ms': 32696.101903915405,
INFO 12-27 15:47:02 [omni.py:542]              'avg_time_per_request_ms': 32696.101903915405,
INFO 12-27 15:47:02 [omni.py:542]              'avg_tokens_per_s': 8.533127307343918},
INFO 12-27 15:47:02 [omni.py:542]             {'stage_id': 2,
INFO 12-27 15:47:02 [omni.py:542]              'requests': 1,
INFO 12-27 15:47:02 [omni.py:542]              'tokens': 0,
INFO 12-27 15:47:02 [omni.py:542]              'total_time_ms': 21952.56805419922,
INFO 12-27 15:47:02 [omni.py:542]              'avg_time_per_request_ms': 21952.56805419922,
INFO 12-27 15:47:02 [omni.py:542]              'avg_tokens_per_s': 0.0}],
INFO 12-27 15:47:02 [omni.py:542]  'transfers': [{'from_stage': 0,
INFO 12-27 15:47:02 [omni.py:542]                 'to_stage': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'total_bytes': 43438635,
INFO 12-27 15:47:02 [omni.py:542]                 'total_time_ms': 47.38354682922363,
INFO 12-27 15:47:02 [omni.py:542]                 'tx_mbps': 7333.960905300466,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_total_bytes': 43438635,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_total_time_ms': 26.821136474609375,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_mbps': 12956.538226073104,
INFO 12-27 15:47:02 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'total_transfer_time_ms': 75.28448104858398,
INFO 12-27 15:47:02 [omni.py:542]                 'total_mbps': 4615.945745521431},
INFO 12-27 15:47:02 [omni.py:542]                {'from_stage': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'to_stage': 2,
INFO 12-27 15:47:02 [omni.py:542]                 'samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'total_bytes': 11072183,
INFO 12-27 15:47:02 [omni.py:542]                 'total_time_ms': 9.931564331054688,
INFO 12-27 15:47:02 [omni.py:542]                 'tx_mbps': 8918.782685928942,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_total_bytes': 11072183,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_total_time_ms': 10.60795783996582,
INFO 12-27 15:47:02 [omni.py:542]                 'rx_mbps': 8350.095780573483,
INFO 12-27 15:47:02 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:47:02 [omni.py:542]                 'total_transfer_time_ms': 21.339893341064453,
INFO 12-27 15:47:02 [omni.py:542]                 'total_mbps': 4150.792254877394}]}
Request ID: 0_b2f5b526-42af-4b62-90b0-b8c36d224097, Text saved to output_audio/0_b2f5b526-42af-4b62-90b0-b8c36d224097.txt
Request ID: 0_b2f5b526-42af-4b62-90b0-b8c36d224097, Saved audio to output_audio/output_0_b2f5b526-42af-4b62-90b0-b8c36d224097.wav

Input image:

INFO 12-27 15:34:14 [log_utils.py:529] {'type': 'request_level_metrics',
INFO 12-27 15:34:14 [log_utils.py:529]  'request_id': '0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8',
INFO 12-27 15:34:14 [log_utils.py:529]  'e2e_time_ms': 130719.03800964355,
INFO 12-27 15:34:14 [log_utils.py:529]  'e2e_tpt': 0.0,
INFO 12-27 15:34:14 [log_utils.py:529]  'num_tokens_out': 0,
INFO 12-27 15:34:14 [log_utils.py:529]  'transfers_total_time_ms': 135.65945625305176,
INFO 12-27 15:34:14 [log_utils.py:529]  'transfers_total_bytes': 54851960,
INFO 12-27 15:34:14 [log_utils.py:529]  'stages': {0: {'stage_gen_time_ms': 19866.636991500854, 'num_tokens_out': 195},
INFO 12-27 15:34:14 [log_utils.py:529]             1: {'stage_gen_time_ms': 86817.47102737427, 'num_tokens_out': 739},
INFO 12-27 15:34:14 [log_utils.py:529]             2: {'stage_gen_time_ms': 22478.3935546875, 'num_tokens_out': 0}}}
INFO 12-27 15:34:14 [omni.py:542] [Summary] {'e2e_requests': 1,
INFO 12-27 15:34:14 [omni.py:542]  'e2e_total_time_ms': 130719.3341255188,
INFO 12-27 15:34:14 [omni.py:542]  'e2e_sum_time_ms': 130719.03800964355,
INFO 12-27 15:34:14 [omni.py:542]  'e2e_total_tokens': 0,
INFO 12-27 15:34:14 [omni.py:542]  'e2e_avg_time_per_request_ms': 130719.03800964355,
INFO 12-27 15:34:14 [omni.py:542]  'e2e_avg_tokens_per_s': 0.0,
INFO 12-27 15:34:14 [omni.py:542]  'wall_time_ms': 130719.3341255188,
INFO 12-27 15:34:14 [omni.py:542]  'final_stage_id': {'0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8': 2},
INFO 12-27 15:34:14 [omni.py:542]  'stages': [{'stage_id': 0,
INFO 12-27 15:34:14 [omni.py:542]              'requests': 1,
INFO 12-27 15:34:14 [omni.py:542]              'tokens': 195,
INFO 12-27 15:34:14 [omni.py:542]              'total_time_ms': 20105.645179748535,
INFO 12-27 15:34:14 [omni.py:542]              'avg_time_per_request_ms': 20105.645179748535,
INFO 12-27 15:34:14 [omni.py:542]              'avg_tokens_per_s': 9.698768592435634},
INFO 12-27 15:34:14 [omni.py:542]             {'stage_id': 1,
INFO 12-27 15:34:14 [omni.py:542]              'requests': 1,
INFO 12-27 15:34:14 [omni.py:542]              'tokens': 739,
INFO 12-27 15:34:14 [omni.py:542]              'total_time_ms': 86876.11937522888,
INFO 12-27 15:34:14 [omni.py:542]              'avg_time_per_request_ms': 86876.11937522888,
INFO 12-27 15:34:14 [omni.py:542]              'avg_tokens_per_s': 8.506365216523612},
INFO 12-27 15:34:14 [omni.py:542]             {'stage_id': 2,
INFO 12-27 15:34:14 [omni.py:542]              'requests': 1,
INFO 12-27 15:34:14 [omni.py:542]              'tokens': 0,
INFO 12-27 15:34:14 [omni.py:542]              'total_time_ms': 22506.65307044983,
INFO 12-27 15:34:14 [omni.py:542]              'avg_time_per_request_ms': 22506.65307044983,
INFO 12-27 15:34:14 [omni.py:542]              'avg_tokens_per_s': 0.0}],
INFO 12-27 15:34:14 [omni.py:542]  'transfers': [{'from_stage': 0,
INFO 12-27 15:34:14 [omni.py:542]                 'to_stage': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'total_bytes': 48552802,
INFO 12-27 15:34:14 [omni.py:542]                 'total_time_ms': 77.91948318481445,
INFO 12-27 15:34:14 [omni.py:542]                 'tx_mbps': 4984.920332167947,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_total_bytes': 48552802,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_total_time_ms': 35.98523139953613,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_mbps': 10793.939649503183,
INFO 12-27 15:34:14 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'total_transfer_time_ms': 114.90654945373535,
INFO 12-27 15:34:14 [omni.py:542]                 'total_mbps': 3380.3331302398033},
INFO 12-27 15:34:14 [omni.py:542]                {'from_stage': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'to_stage': 2,
INFO 12-27 15:34:14 [omni.py:542]                 'samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'total_bytes': 6299158,
INFO 12-27 15:34:14 [omni.py:542]                 'total_time_ms': 9.206533432006836,
INFO 12-27 15:34:14 [omni.py:542]                 'tx_mbps': 5473.6415581576075,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_total_bytes': 6299158,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_total_time_ms': 10.441064834594727,
INFO 12-27 15:34:14 [omni.py:542]                 'rx_mbps': 4826.448719390222,
INFO 12-27 15:34:14 [omni.py:542]                 'total_samples': 1,
INFO 12-27 15:34:14 [omni.py:542]                 'total_transfer_time_ms': 20.752906799316406,
INFO 12-27 15:34:14 [omni.py:542]                 'total_mbps': 2428.250870459262}]}
Request ID: 0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8, Text saved to output_audio/0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8.txt
Request ID: 0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8, Saved audio to output_audio/output_0_10bef2e3-fb58-4f14-aaf1-57929c46f2d8.wav

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Dec 27, 2025
@hsliuustc0106 hsliuustc0106 merged commit be81443 into vllm-project:main Dec 27, 2025
7 checks passed
yenuo26 pushed a commit to yenuo26/vllm-omni that referenced this pull request Dec 29, 2025
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
lishunyang12 pushed a commit to lishunyang12/vllm-omni that referenced this pull request Dec 29, 2025
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
tzhouam pushed a commit to tzhouam/vllm-omni that referenced this pull request Dec 30, 2025
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
HonestDeng pushed a commit to HonestDeng/vllm-omni that referenced this pull request Dec 30, 2025
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: HonestDeng <2958906959@qq.com>
princepride pushed a commit to princepride/vllm-omni that referenced this pull request Jan 10, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
ZJY0516 pushed a commit to LawJarp-A/vllm-omni that referenced this pull request Jan 10, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: gpu_generational_model_runner.py calls self.model.forward(**kwargs) should instead call self._model_forward(**kwargs) in _run_generation_model

2 participants