-
Notifications
You must be signed in to change notification settings - Fork 611
Description
There are certain cases where outlines users require unstructured text until a particular token is reached, such as <|beginfunctioncall|>
, code blocks, JSON blocks. etc.
Currently, these are often handled with regular expressions that can be quite messy.
DeepSeek's thinking tokens are a great example. DeepSeek R1's response begins with <think>
and then includes some amount of unstructured text, followed by </think>
. Thinking tokens are the primary reason for R1's performance.
The issue with the current solution (specifying a regular expression) is that the regular expression is quite complicated and difficult to debug:
# Set up response format you want the LLM to respond with.
class YesNo(BaseModel):
answer: Literal['yes', 'no']
yesno_regex = build_regex_from_schema(convert_json_schema_to_str(YesNo))
# Add the thinking prefix to the regex
thinking_regex = r'<think>((.|\n){0,' + str(NUM_THINKING_CHARACTERS) + r'}?)\[TRUNCATED\]</think>'
# Combine the thinking regex and the yesno regex
result_regex = thinking_regex + yesno_regex
This includes both a thinking block and the structured output at the end.
Regular expression compilation times can explode if you want to include things like thought controls that limit the thinking response, add allowed/disallowed words, or structured text to the thinking block. For example, if you force the thinking block to have a range of lengths between 10 and 50:
thinking_regex = r'<think>((?:.|\n|\s){10,50}?)<\/think>(\s)*'
Compiling a single, smaller regular expression for only the thinking block could take some load off of our compiler, and simplify the user interface for complicated and mixed forms of structure.
Ideally, I would prefer to be able to structure the thinking block and the output block separately, i.e. have separate logit processors for different sections of text. You could also jump to any number of different logit processors based on tokens you hit.
Consider the following interface, where I define a default pattern pattern_1
. When a b
token (or substring?) is hit, I want to switch to pattern_2
.
pattern_1 = r'.{10,50}'
pattern_2 = r'(cat|dog|squirrel)'
pattern_dict = {'<|bos|>':pattern_1, 'b':pattern_2}
generator = outlines.generate.regex(model, pattern_dict)
or, in the case of DeepSeek R1 (roughly):
pattern_1 = r'.*'
pattern_2 = r'(cat|dog|squirrel)'
# NOTE: may want to provide a default
# pattern that begins at the BOS token
pattern_dict = {'<think>':pattern_1, '</think>':pattern_2}
generator = outlines.generate.regex(model, pattern_dict)
I think it'd be a nice convenience feature, and it might cut down on compilation times for very complicated grammars. It would introduce some challenges in how to communicate patterns to inference servers like vLLM, which does not have a way to receive multiple patterns for constrained decoding.
Thoughts?