server : fix incorrect num_tokens_predicted #3480

jhen0409 · 2023-10-05T00:46:43Z

Make it only counting num_tokens_predicted for token generation
Remove the assertion in format_timings, the assertion might be wrong because n_eval is not equal to the count of generated tokens. For instance we set n_batch = 1, the count of timings.n_eval will include prompt.

…example * 'master' of github.com:ggerganov/llama.cpp: kv cache slot search improvements (ggml-org#3493) prompts : fix editorconfig checks after ggml-org#3416 parallel : add option to load external prompt file (ggml-org#3416) server : reuse llama_sample_token common util (ggml-org#3494) llama : correct hparams comparison (ggml-org#3446) ci : fix xcodebuild destinations (ggml-org#3491) convert : update Falcon script for new HF config (ggml-org#3448) build : use std::make_tuple() for compatibility with older GCC versions (ggml-org#3488) common : process escape sequences in reverse prompts (ggml-org#3461) CLBlast: Fix handling of on-device tensor data server : fix incorrect num_tokens_predicted (ggml-org#3480) swift : disable ACCELERATE_NEW_LAPACK (ggml-org#3481) ci : add swift build via xcodebuild (ggml-org#3482)

server : fix incorrect num_tokens_predicted

97d574b

ggerganov approved these changes Oct 5, 2023

View reviewed changes

ggerganov merged commit e8b8d32 into ggml-org:master Oct 5, 2023

yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023

server : fix incorrect num_tokens_predicted (ggml-org#3480)

e5133d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : fix incorrect num_tokens_predicted #3480

server : fix incorrect num_tokens_predicted #3480

jhen0409 commented Oct 5, 2023

server : fix incorrect num_tokens_predicted #3480

server : fix incorrect num_tokens_predicted #3480

Conversation

jhen0409 commented Oct 5, 2023