Open
Description
Running the benchmark script on a llama-3-8b-inst on inferentia 2 (djl-serving) results in:
python3.10 token_benchmark_ray.py \
--model "openai/llama3-8b-inst" \
--mean-input-tokens 550 \
--stddev-input-tokens 150 \
--mean-output-tokens 150 \
--stddev-output-tokens 10 \
--max-num-completed-requests 1 \
--timeout 600 \
--num-concurrent-requests 1 \
--results-dir "result_outputs" \
--llm-api "openai" \
--additional-sampling-params '{}'
Traceback (most recent call last):
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 456, in <module>
run_token_benchmark(
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 297, in run_token_benchmark
summary, individual_responses = get_token_throughput_latencies(
File "/Users/yaron/projects/llmperf/token_benchmark_ray.py", line 116, in get_token_throughput_latencies
request_metrics[common_metrics.REQ_OUTPUT_THROUGHPUT] = num_output_tokens / request_metrics[common_metrics.E2E_LAT]
ZeroDivisionError: division by zero
Metadata
Metadata
Assignees
Labels
No labels