Skip to content

Commit 2f03a3a

Browse files
Align parameters for "max_token, repetition_penalty,presence_penalty,frequency_penalty" (#726)
Signed-off-by: Xinyao Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 372d78c commit 2f03a3a

File tree

24 files changed

+110
-72
lines changed

24 files changed

+110
-72
lines changed

AudioQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
108108
# llm microservice
109109
curl http://${host_ip}:3007/v1/chat/completions\
110110
-X POST \
111-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
112112
-H 'Content-Type: application/json'
113113

114114
# speecht5 service

AudioQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ curl http://${host_ip}:3006/generate \
108108
# llm microservice
109109
curl http://${host_ip}:3007/v1/chat/completions\
110110
-X POST \
111-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
111+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":false}' \
112112
-H 'Content-Type: application/json'
113113

114114
# speecht5 service

AudioQnA/tests/test_gmc_on_gaudi.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_audioqa() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
37-
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
37+
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
3838
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
3939
if [ -z "$byte_str" ]; then
4040
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

AudioQnA/tests/test_gmc_on_xeon.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ function validate_audioqa() {
3434
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
3535
echo "$CLIENT_POD"
3636
accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='audioqa')].status.accessUrl}")
37-
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_new_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
37+
byte_str=$(kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -s -X POST -d '{"byte_str": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA", "parameters":{"max_tokens":64, "do_sample": true, "streaming":false}}' -H 'Content-Type: application/json' | jq .byte_str)
3838
echo "$byte_str" > $LOG_PATH/curl_audioqa.log
3939
if [ -z "$byte_str" ]; then
4040
echo "audioqa failed, please check the logs in ${LOG_PATH}!"

ChatQnA/benchmark/benchmark.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ test_cases:
4141
run_test: false
4242
service_name: "llm-svc" # Replace with your service name
4343
parameters:
44-
max_new_tokens: 128
44+
max_tokens: 128
4545
temperature: 0.01
4646
top_k: 10
4747
top_p: 0.95

ChatQnA/chatqna_no_wrapper.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,12 @@ def align_inputs(self, inputs, cur_node, runtime_graph, llm_parameters_dict, **k
6969
next_inputs = {}
7070
next_inputs["model"] = "tgi" # specifically clarify the fake model to make the format unified
7171
next_inputs["messages"] = [{"role": "user", "content": inputs["inputs"]}]
72-
next_inputs["max_tokens"] = llm_parameters_dict["max_new_tokens"]
72+
next_inputs["max_tokens"] = llm_parameters_dict["max_tokens"]
7373
next_inputs["top_p"] = llm_parameters_dict["top_p"]
7474
next_inputs["stream"] = inputs["streaming"]
75-
next_inputs["frequency_penalty"] = inputs["repetition_penalty"]
75+
next_inputs["frequency_penalty"] = inputs["frequency_penalty"]
76+
next_inputs["presence_penalty"] = inputs["presence_penalty"]
77+
next_inputs["repetition_penalty"] = inputs["repetition_penalty"]
7678
next_inputs["temperature"] = inputs["temperature"]
7779
inputs = next_inputs
7880

ChatQnA/docker_compose/intel/cpu/aipc/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ OLLAMA_HOST=${host_ip}:11434 ollama run $OLLAMA_MODEL
229229
```bash
230230
curl http://${host_ip}:9000/v1/chat/completions\
231231
-X POST \
232-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
232+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
233233
-H 'Content-Type: application/json'
234234
```
235235

ChatQnA/docker_compose/intel/cpu/xeon/README.md

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -438,18 +438,31 @@ docker compose -f compose_vllm.yaml up -d
438438
This service depends on above LLM backend service startup. It will be ready after long time, to wait for them being ready in first startup.
439439

440440
```bash
441+
# TGI service
441442
curl http://${host_ip}:9000/v1/chat/completions\
442443
-X POST \
443-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
444445
-H 'Content-Type: application/json'
445446
```
446447

448+
For parameters in TGI modes, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
449+
450+
```bash
451+
# vLLM Service
452+
curl http://${your_ip}:9000/v1/chat/completions \
453+
-X POST \
454+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
455+
-H 'Content-Type: application/json'
456+
```
457+
458+
For parameters in vLLM modes, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
459+
447460
8. MegaService
448461

449462
```bash
450-
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
451-
"messages": "What is the revenue of Nike in 2023?"
452-
}'
463+
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
464+
"messages": "What is the revenue of Nike in 2023?"
465+
}'
453466
```
454467

455468
9. Dataprep Microservice(Optional)

ChatQnA/docker_compose/intel/cpu/xeon/README_qdrant.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -304,7 +304,7 @@ docker compose -f compose_qdrant.yaml up -d
304304
```bash
305305
curl http://${host_ip}:6047/v1/chat/completions\
306306
-X POST \
307-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
307+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
308308
-H 'Content-Type: application/json'
309309
```
310310

ChatQnA/docker_compose/intel/hpu/gaudi/README.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -442,18 +442,41 @@ For validation details, please refer to [how-to-validate_service](./how_to_valid
442442
7. LLM Microservice
443443

444444
```bash
445+
# TGI service
446+
curl http://${host_ip}:9000/v1/chat/completions\
447+
-X POST \
448+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
449+
-H 'Content-Type: application/json'
450+
```
451+
452+
For parameters in TGI mode, please refer to [HuggingFace InferenceClient API](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation) (except we rename "max_new_tokens" to "max_tokens".)
453+
454+
```bash
455+
# vLLM Service
445456
curl http://${host_ip}:9000/v1/chat/completions \
457+
-X POST \
458+
-d '{"query":"What is Deep Learning?","max_tokens":17,"top_p":1,"temperature":0.7,"frequency_penalty":0,"presence_penalty":0, "streaming":false}' \
459+
-H 'Content-Type: application/json'
460+
```
461+
462+
For parameters in vLLM Mode, can refer to [LangChain VLLMOpenAI API](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.vllm.VLLMOpenAI.html)
463+
464+
```bash
465+
# vLLM-on-Ray Service
466+
curl http://${your_ip}:9000/v1/chat/completions \
446467
-X POST \
447-
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
468+
-d '{"query":"What is Deep Learning?","max_tokens":17,"presence_penalty":1.03","streaming":false}' \
448469
-H 'Content-Type: application/json'
449470
```
450471

472+
For parameters in vLLM-on-Ray mode, can refer to [LangChain ChatOpenAI API](https://python.langchain.com/v0.2/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)
473+
451474
8. MegaService
452475

453476
```bash
454477
curl http://${host_ip}:8888/v1/chatqna -H "Content-Type: application/json" -d '{
455-
"messages": "What is the revenue of Nike in 2023?"
456-
}'
478+
"messages": "What is the revenue of Nike in 2023?"
479+
}'
457480
```
458481

459482
9. Dataprep Microservice(Optional)

0 commit comments

Comments
 (0)