main : add stop keywords #1387

ejones · 2023-05-10T03:57:03Z

Resurrects #769, which was ready to go but abandoned in favor of #863, which was reverted. #769 was itself a rewrite of #365 by @joshmackwilliams. Fixes #57. I've also simplified the code a bit.

From the original author:

Stop keywords can be specified using the "--stop" parameter. Upon seeing one of these keywords in the generated output, the model will terminate generation immediately. Like reverse prompts, multiple stop keywords can be specified by specifying the --stop argument multiple times.

The implementation is heavily based on the reverse prompt implementation...

Testing

Tested with 30B in interactive and non-interactive modes. Note that in interactive mode, --stop still terminates the process. This appears to be the original intent.

Non-interactive, without --stop:

 % ./main -m $LLAMA_30B_Q4_0 -c 1024 -n 32 -p "$(cat prompts/chat-with-bob.txt) Name a color"$'\n'"Bob: "
main: build = 529 (365869d)
main: seed  = 1683689302
...
 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: Name a color
Bob: 1,053,464 bytes of memory are being used by the current session of your Windows.
User: What year were you born?
llama_print_timings:        load time = 11568.84 ms
llama_print_timings:      sample time =    22.28 ms /    32 runs   (    0.70 ms per run)
llama_print_timings: prompt eval time = 11554.26 ms /   106 tokens (  109.00 ms per token)
llama_print_timings:        eval time =  6173.09 ms /    31 runs   (  199.13 ms per run)
llama_print_timings:       total time = 17767.44 ms

Non-interactive, with --stop

% ./main -m $LLAMA_30B_Q4_0 -c 1024 -n 32 -p "$(cat prompts/chat-with-bob.txt) Name a color"$'\n'"Bob: " --stop User:
main: build = 529 (365869d)
main: seed  = 1683689361
...
 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: Name a color
Bob:  Red, blue, green, black and white.
User:
llama_print_timings:        load time = 11683.41 ms
llama_print_timings:      sample time =     8.97 ms /    13 runs   (    0.69 ms per run)
llama_print_timings: prompt eval time = 11665.79 ms /   106 tokens (  110.05 ms per token)
llama_print_timings:        eval time =  2367.22 ms /    12 runs   (  197.27 ms per run)
llama_print_timings:       total time = 14061.44 ms

Interactive with --stop

% ./main -m $LLAMA_30B_Q4_0 -c 1024 -n 32 -f prompts/chat-with-bob.txt -r User: --stop blue                          
main: build = 529 (365869d)
main: seed  = 1683689426
...
== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: name a color
Bob: Name one of these colors: Blue, Green, Red, Black, White, Orange, Pink, Yellow, Brown,
 Purple.
User: name a color in lower case
Bob: Name one of these colors: blue
llama_print_timings:        load time = 11954.31 ms
llama_print_timings:      sample time =    30.58 ms /    43 runs   (    0.71 ms per run)
llama_print_timings: prompt eval time = 13830.20 ms /   110 tokens (  125.73 ms per token)
llama_print_timings:        eval time =  7563.43 ms /    42 runs   (  180.08 ms per run)
llama_print_timings:       total time = 360323.90 ms

Multiple --stop:

% ./main -m $LLAMA_30B_Q4_0 -c 1024 -n 32 -p "$(cat prompts/chat-with-bob.txt) Name a" --stop User: --stop Bob:
main: build = 529 (365869d)
main: seed  = 1683689992
...
 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User: Name a country in Asia.
Bob:
llama_print_timings:        load time = 11892.59 ms
llama_print_timings:      sample time =     4.84 ms /     7 runs   (    0.69 ms per run)
llama_print_timings: prompt eval time = 11874.44 ms /   101 tokens (  117.57 ms per token)
llama_print_timings:        eval time =  1299.45 ms /     6 runs   (  216.57 ms per run)
llama_print_timings:       total time = 13197.79 ms

DannyDaemonic · 2023-05-10T06:21:06Z

What's the argument for making a new option vs just having a --stop-on-reverse-prompt type option?

Edit: Or I remember reading a PR somewhere that there would change -r such that it just doesn't automatically trigger interactive mode.

ejones · 2023-05-11T02:36:08Z

change -r such that it just doesn't automatically trigger interactive mode.

#1032 it looks like? Yeah, that was my first instinct. Tbh I went with #769 because it seemed to have consensus (@ggerganov approved) and just got tied up in #863. I'm not so opinionated as to reject the duplicative option.

That said, from what I could discern, the reasons for a distinct option seemed to include:

backwards compat for -r triggering interactive
-r has some interaction with --instruct
using distinct -r and --stop, with the latter terminating the process (from what I can tell). This use case is less clear to me

ejones · 2023-05-11T03:24:51Z

Side note: the workaround for non-interactive stopping that @SlyEcho notes in #1032 (piping in /dev/null) doesn't appear to work any longer. As of #1040 it looks like EOF no longer terminates the process.

DannyDaemonic · 2023-05-11T09:19:05Z

Ah yes, thank you. I was referring to #1032. I prefer that solution over adding a second set of antiprompts just for exiting. Let's see if we can't push that one through.

ejones · 2023-05-11T19:09:16Z

Close in favor of #1032

Claude Doppler and others added 3 commits May 9, 2023 22:44

feat: add "stop" keywords as alternative to eot token

72f102a

fix endline

b4d04d1

simplify code

2041e1e

Merge branch 'master' into stop-keywords

2db16be

ejones requested a review from ggerganov May 11, 2023 03:25

ejones removed the request for review from ggerganov May 11, 2023 11:14

ejones closed this May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main : add stop keywords #1387

main : add stop keywords #1387

ejones commented May 10, 2023

DannyDaemonic commented May 10, 2023 •

edited

Loading

ejones commented May 11, 2023

ejones commented May 11, 2023

DannyDaemonic commented May 11, 2023 •

edited

Loading

ejones commented May 11, 2023

main : add stop keywords #1387

main : add stop keywords #1387

Conversation

ejones commented May 10, 2023

Testing

DannyDaemonic commented May 10, 2023 • edited Loading

ejones commented May 11, 2023

ejones commented May 11, 2023

DannyDaemonic commented May 11, 2023 • edited Loading

ejones commented May 11, 2023

DannyDaemonic commented May 10, 2023 •

edited

Loading

DannyDaemonic commented May 11, 2023 •

edited

Loading