Skip to content

--Keep -1 defaults to 2, Forgotten prompt. #1790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TonyWeimmer40 opened this issue Jun 10, 2023 · 3 comments
Closed

--Keep -1 defaults to 2, Forgotten prompt. #1790

TonyWeimmer40 opened this issue Jun 10, 2023 · 3 comments

Comments

@TonyWeimmer40
Copy link

TonyWeimmer40 commented Jun 10, 2023

On a Windows system, --keep n does not work.

For context this issue has been documented, see #1647. The solution was found, and never revealed by OP. It is expected that all tokens be remembered but as you can see, when --keep is set to -1 it defaults to 2 (Obvious in first line and bottom line).

Z:\llama.cpp>main -m Z:/30-33B/wizardlm-30b-ggml.bin -ins --keep -1 -c 2048 -n 2048 --color -t 6 --mlock
main: build = 635 (5c64a09)
main: seed  = 1686404245
llama.cpp: loading model from Z:/30-33B/wizardlm-30b-ggml.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 10 (mostly Q2_K)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: mem required  = 15273.95 MB (+ 3124.00 MB per state)
....................................................................................................
llama_init_from_file: kv self size  = 3120.00 MB

system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:

'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 512, n_predict = 2048, n_keep = 2
@KerfuffleV2
Copy link
Collaborator

The "prompt" is what you specify with --file or --prompt. Input you add interactively later on doesn't count as "the prompt".

The reason you see 2 is because llama.cpp ensures the prompt always starts with a beginning of document token + a single space - 2 tokens in total.

@ghost
Copy link

ghost commented Jun 10, 2023

n_keep is technically explained by @SlyEcho in #1647

n_keep with value -1 should keep all of the original prompt (from -p or -f). Without a prompt llama.cpp defaults to 2 (document token and a space)

@TonyWeimmer40
Copy link
Author

TonyWeimmer40 commented Jun 10, 2023

Thank you, makes sense. Closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants