vLLM native input streaming #9

njhill · 2026-01-29T19:30:02Z

njhill
Jan 29, 2026

Congratulations on the exciting model release!

We recently added some more native support in vLLM for streaming input based requests (see vllm-project/vllm#28973). It might be nice to look at how the streaming transcription APIs could exploit that. We would appreciate any feedback on the functionality if there are any adjustments needed to work with the model.

wangxiongts · 2026-01-30T02:02:58Z

wangxiongts
Jan 30, 2026
Maintainer

Thank you. Our current streaming implementation is more focused on demonstrating the algorithm computation process, and because there are few tokens for speech, it is even difficult to achieve the minimum block_size. Therefore, we did not consider kv-cache reuse in our open source code. However, this is still far from industrial implementation. We will continue to pay attention to this part and work with the vllm community to jointly build it. :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM native input streaming #9

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

vLLM native input streaming #9

Uh oh!

njhill Jan 29, 2026

Replies: 1 comment

Uh oh!

wangxiongts Jan 30, 2026 Maintainer

njhill
Jan 29, 2026

wangxiongts
Jan 30, 2026
Maintainer