Skip to content

Commit cef6eac

Browse files
Add asr/tts components for xeon and hpu (#222)
* add asr/tts component for xeon and hpu Signed-off-by: Spycsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix * fix * fix ffmpeg JSONDecode error on HPU * add tests * trigger * try --------- Signed-off-by: Spycsh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent fe5f394 commit cef6eac

23 files changed

+792
-224
lines changed

comps/asr/Dockerfile

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,6 @@ FROM python:3.11-slim
55

66
ENV LANG C.UTF-8
77

8-
# Install system dependencies
9-
RUN apt-get update \
10-
&& apt-get install -y ffmpeg
11-
128
COPY comps /home/comps
139

1410
RUN pip install --no-cache-dir --upgrade pip && \

comps/asr/README.md

Lines changed: 68 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,35 +12,93 @@ To start the ASR microservice with Python, you need to first install python pack
1212
pip install -r requirements.txt
1313
```
1414

15-
## 1.2 Start ASR Service with Python Script
15+
## 1.2 Start Whisper Service/Test
16+
17+
- Xeon CPU
18+
19+
```bash
20+
cd whisper/
21+
nohup python whisper_server.py --device=cpu &
22+
python check_whisper_server.py
23+
```
24+
25+
- Gaudi2 HPU
26+
27+
```bash
28+
pip install optimum[habana]
29+
30+
cd whisper/
31+
nohup python whisper_server.py --device=hpu &
32+
python check_whisper_server.py
33+
```
34+
35+
## 1.3 Start ASR Service/Test
1636

1737
```bash
1838
python asr.py
39+
python check_asr_server.py
1940
```
2041

2142
# 🚀2. Start Microservice with Docker (Option 2)
2243

2344
Alternatively, you can also start the ASR microservice with Docker.
2445

25-
## 2.1 Build Docker Image
46+
## 2.1 Build Images
47+
48+
### 2.1.1 Whisper Server Image
49+
50+
- Xeon CPU
51+
52+
```bash
53+
cd ../..
54+
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile .
55+
```
56+
57+
- Gaudi2 HPU
58+
59+
```bash
60+
cd ../..
61+
docker build -t opea/whisper:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/whisper/Dockerfile_hpu .
62+
```
63+
64+
### 2.1.2 ASR Service Image
2665

2766
```bash
28-
cd ../../
2967
docker build -t opea/asr:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/asr/Dockerfile .
3068
```
3169

32-
## 2.2 Run Docker with CLI
70+
## 2.2 Start Whisper and ASR Service
71+
72+
### 2.2.1 Start Whisper Server
73+
74+
- Xeon
75+
76+
```bash
77+
docker run -p 7066:7066 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest
78+
```
79+
80+
- Gaudi2 HPU
3381

3482
```bash
35-
docker run -p 9099:9099 --network=host --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/asr:latest
83+
docker run -p 7066:7066 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy opea/whisper:latest
3684
```
3785

38-
# 🚀3. Consume ASR Service
86+
### 2.2.2 Start ASR service
87+
88+
```bash
89+
ip_address=$(hostname -I | awk '{print $1}')
90+
91+
docker run -d -p 9099:9099 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e ASR_ENDPOINT=http://$ip_address:7066 opea/asr:latest
92+
```
3993

40-
You can use the following `curl` command to test whether the service is up. Notice that the first request can be slow because it needs to download the models.
94+
### 2.2.3 Test
4195

4296
```bash
43-
curl http://localhost:9099/v1/audio/transcriptions \
44-
-H "Content-Type: application/json" \
45-
-d '{"url": "https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav"}'
97+
# Use curl or python
98+
99+
# curl
100+
http_proxy="" curl http://localhost:9099/v1/audio/transcriptions -XPOST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}' -H 'Content-Type: application/json'
101+
102+
# python
103+
python check_asr_server.py
46104
```

comps/asr/asr.py

Lines changed: 23 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,78 +1,22 @@
11
# Copyright (C) 2024 Intel Corporation
22
# SPDX-License-Identifier: Apache-2.0
33

4-
import contextlib
4+
import json
55
import os
66
import time
77

88
import numpy as np
9-
import torch
10-
from datasets import Audio, Dataset
11-
from pydub import AudioSegment
12-
from transformers import WhisperForConditionalGeneration, WhisperProcessor
13-
14-
from comps import Audio2TextDoc, ServiceType, TextDoc, opea_microservices, opea_telemetry, register_microservice
15-
16-
17-
@opea_telemetry
18-
def _audiosegment_to_librosawav(audiosegment):
19-
channel_sounds = audiosegment.split_to_mono()[:1] # only select the first channel
20-
samples = [s.get_array_of_samples() for s in channel_sounds]
21-
22-
fp_arr = np.array(samples).T.astype(np.float32)
23-
fp_arr /= np.iinfo(samples[0].typecode).max
24-
fp_arr = fp_arr.reshape(-1)
25-
26-
return fp_arr
27-
28-
29-
@opea_telemetry
30-
def audio2text(
31-
audio_path,
32-
model_name_or_path="openai/whisper-small",
33-
language=None,
34-
bf16=False,
35-
device="cpu",
36-
):
37-
"""Convert audio to text."""
38-
start = time.time()
39-
model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path).to(device)
40-
processor = WhisperProcessor.from_pretrained(model_name_or_path)
41-
model.eval()
42-
bf16 = bf16
43-
if bf16:
44-
import intel_extension_for_pytorch as ipex
45-
46-
model = ipex.optimize(model, dtype=torch.bfloat16)
47-
language = language
48-
49-
try:
50-
waveform = AudioSegment.from_file(audio_path).set_frame_rate(16000)
51-
waveform = _audiosegment_to_librosawav(waveform)
52-
except Exception as e:
53-
print(f"[ASR] audiosegment to librosa wave fail: {e}")
54-
audio_dataset = Dataset.from_dict({"audio": [audio_path]}).cast_column("audio", Audio(sampling_rate=16000))
55-
waveform = audio_dataset[0]["audio"]["array"]
56-
57-
inputs = processor.feature_extractor(waveform, return_tensors="pt", sampling_rate=16_000).input_features.to(device)
58-
with torch.cpu.amp.autocast() if bf16 else contextlib.nullcontext():
59-
if language is None:
60-
predicted_ids = model.generate(inputs)
61-
elif language == "auto":
62-
model.config.forced_decoder_ids = None
63-
predicted_ids = model.generate(inputs)
64-
else:
65-
forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task="transcribe")
66-
model.config.forced_decoder_ids = forced_decoder_ids
67-
predicted_ids = model.generate(inputs)
68-
69-
result = processor.tokenizer.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]
70-
if language == "auto" or language == "zh":
71-
from zhconv import convert
72-
73-
result = convert(result, "zh-cn")
74-
print(f"generated text in {time.time() - start} seconds, and the result is: {result}")
75-
return result
9+
import requests
10+
11+
from comps import (
12+
Base64ByteStrDoc,
13+
ServiceType,
14+
TextDoc,
15+
opea_microservices,
16+
register_microservice,
17+
register_statistics,
18+
statistics_dict,
19+
)
7620

7721

7822
@register_microservice(
@@ -81,26 +25,22 @@ def audio2text(
8125
endpoint="/v1/audio/transcriptions",
8226
host="0.0.0.0",
8327
port=9099,
84-
input_datatype=Audio2TextDoc,
28+
input_datatype=Base64ByteStrDoc,
8529
output_datatype=TextDoc,
8630
)
87-
@opea_telemetry
88-
async def audio_to_text(audio: Audio2TextDoc):
89-
audio.tensor, audio.frame_rate = audio.url.load() # AudioNdArray, fr
90-
audio_path = f"{audio.id}.wav"
91-
audio.tensor.save(audio_path, frame_rate=16000)
31+
@register_statistics(names=["opea_service@asr"])
32+
async def audio_to_text(audio: Base64ByteStrDoc):
33+
start = time.time()
34+
byte_str = audio.byte_str
35+
inputs = {"audio": byte_str}
36+
37+
response = requests.post(url=f"{asr_endpoint}/v1/asr", data=json.dumps(inputs), proxies={"http": None})
9238

93-
try:
94-
asr_result = audio2text(audio_path, model_name_or_path=audio.model_name_or_path, language=audio.language)
95-
except Exception as e:
96-
print(e)
97-
asr_result = e
98-
finally:
99-
os.remove(audio_path)
100-
res = TextDoc(text=asr_result)
101-
return res
39+
statistics_dict["opea_service@asr"].append_latency(time.time() - start, None)
40+
return TextDoc(text=response.json()["asr_result"])
10241

10342

10443
if __name__ == "__main__":
44+
asr_endpoint = os.getenv("ASR_ENDPOINT", "http://localhost:7066")
10545
print("[asr - router] ASR initialized.")
10646
opea_microservices["opea_service@asr"].start()

comps/asr/check_asr_server.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import base64
5+
import json
6+
import os
7+
import urllib.request
8+
import uuid
9+
from io import BytesIO
10+
11+
import requests
12+
13+
# https://gist.github.com/novwhisky/8a1a0168b94f3b6abfaa
14+
# test_audio_base64_str = "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"
15+
16+
uid = str(uuid.uuid4())
17+
file_name = uid + ".wav"
18+
19+
urllib.request.urlretrieve(
20+
"https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav",
21+
file_name,
22+
)
23+
24+
with open(file_name, "rb") as f:
25+
test_audio_base64_str = base64.b64encode(f.read()).decode("utf-8")
26+
os.remove(file_name)
27+
28+
endpoint = "http://localhost:9099/v1/audio/transcriptions"
29+
inputs = {"byte_str": test_audio_base64_str}
30+
response = requests.post(url=endpoint, data=json.dumps(inputs), proxies={"http": None})
31+
print(response.json())

comps/asr/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
datasets
22
docarray[full]
33
fastapi
4-
intel_extension_for_pytorch
54
opentelemetry-api
65
opentelemetry-exporter-otlp
76
opentelemetry-sdk
7+
optimum[habana]
8+
pydantic==2.7.2
89
pydub
910
shortuuid
1011
torch

comps/asr/whisper/Dockerfile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
FROM python:3.11-slim
5+
6+
# Set environment variables
7+
ENV LANG=en_US.UTF-8
8+
ENV PYTHONPATH=/home/user
9+
10+
# Install system dependencies
11+
RUN apt-get update \
12+
&& apt-get install -y ffmpeg
13+
14+
COPY comps /home/comps
15+
16+
RUN pip install --no-cache-dir --upgrade pip && \
17+
pip install --no-cache-dir -r /home/comps/asr/requirements.txt
18+
19+
ENV PYTHONPATH=$PYTHONPATH:/home
20+
21+
WORKDIR /home/comps/asr/whisper
22+
23+
ENTRYPOINT ["python", "whisper_server.py", "--device", "cpu"]

comps/asr/whisper/Dockerfile_hpu

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# HABANA environment
5+
FROM vault.habana.ai/gaudi-docker/1.14.0/ubuntu22.04/habanalabs/pytorch-installer-2.1.1 AS hpu
6+
7+
# Set environment variables
8+
ENV LANG=en_US.UTF-8
9+
ENV PYTHONPATH=/home/user:/usr/lib/habanalabs/:/optimum-habana
10+
11+
# Install system dependencies
12+
RUN apt-get update \
13+
&& apt-get install -y ffmpeg
14+
15+
COPY comps /home/comps
16+
17+
# Install requirements and optimum habana
18+
RUN pip install --no-cache-dir --upgrade pip && \
19+
pip install --no-cache-dir -r /home/comps/asr/requirements.txt && \
20+
pip install optimum[habana]
21+
22+
ENV PYTHONPATH=$PYTHONPATH:/home
23+
24+
WORKDIR /home/comps/asr/whisper
25+
26+
ENTRYPOINT ["python", "whisper_server.py", "--device", "hpu"]

comps/asr/whisper/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import base64
5+
import json
6+
import os
7+
import urllib.request
8+
import uuid
9+
from io import BytesIO
10+
11+
import requests
12+
13+
# https://gist.github.com/novwhisky/8a1a0168b94f3b6abfaa
14+
# test_audio_base64_str = "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"
15+
16+
uid = str(uuid.uuid4())
17+
file_name = uid + ".wav"
18+
19+
urllib.request.urlretrieve(
20+
"https://github.com/intel/intel-extension-for-transformers/raw/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav",
21+
file_name,
22+
)
23+
24+
with open(file_name, "rb") as f:
25+
test_audio_base64_str = base64.b64encode(f.read()).decode("utf-8")
26+
os.remove(file_name)
27+
28+
endpoint = "http://localhost:7066/v1/asr"
29+
inputs = {"audio": test_audio_base64_str}
30+
response = requests.post(url=endpoint, data=json.dumps(inputs), proxies={"http": None})
31+
print(response.json())

0 commit comments

Comments
 (0)