Your current environment
import requests
url = "http://0.0.0.0:9881/v1/chat/completions"
headers = {"Content-Type": "application/json"}
target_language = "Arabic"
data = {
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": f"Transcribe the following audio into {target_language}."},
{
"type": "audio_url",
"audio_url": {
"url": "file:///nas/xyq/Qwen-Asr-POC/5262126349647872/1770072158499.mp3" # "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"
},
}
],
}
]
}
response = requests.post(url, headers=headers, json=data, timeout=300)
response.raise_for_status()
content = response.json()['choices'][0]['message']['content']
print(content)
language, text = content.split("<asr_text>")
print(language)
print(text)
(base) ➜ Qwen3-ASR git:(main) ✗ python3 infer_vllm_demo.py
language English<asr_text>I'm a geek. I'm a geek. I'm a geek. Yeah, I'm a geek. I'm a geek. I'm a geek. I'm a geek. I'm a geek.
language English
I'm a geek. I'm a geek. I'm a geek. Yeah, I'm a geek. I'm a geek. I'm a geek. I'm a geek. I'm a geek.
(base) ➜ Qwen3-ASR git:(main) ✗
I want the language to be Arabic
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Before submitting a new issue...
Your current environment
(base) ➜ Qwen3-ASR git:(main) ✗ python3 infer_vllm_demo.py
language English<asr_text>I'm a geek. I'm a geek. I'm a geek. Yeah, I'm a geek. I'm a geek. I'm a geek. I'm a geek. I'm a geek.
language English
I'm a geek. I'm a geek. I'm a geek. Yeah, I'm a geek. I'm a geek. I'm a geek. I'm a geek. I'm a geek.
(base) ➜ Qwen3-ASR git:(main) ✗
I want the language to be Arabic
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.
Before submitting a new issue...