Token-triggered processors

There are certain cases where outlines users require unstructured text until a particular token is reached, such as `<|beginfunctioncall|>` , code blocks, JSON blocks. etc.

Currently, these are often handled with regular expressions that can be quite messy. 

DeepSeek's thinking tokens are a great example. DeepSeek R1's response begins with `<think>` and then includes some amount of unstructured text, followed by `</think>`. Thinking tokens are the primary reason for R1's performance.

The issue with the current solution (specifying a regular expression) is that the regular expression is quite complicated and difficult to debug:

```python
# Set up response format you want the LLM to respond with.
class YesNo(BaseModel):
    answer: Literal['yes', 'no']

yesno_regex = build_regex_from_schema(convert_json_schema_to_str(YesNo))

# Add the thinking prefix to the regex 
thinking_regex = r'<think>((.|\n){0,' + str(NUM_THINKING_CHARACTERS) + r'}?)\[TRUNCATED\]</think>'

# Combine the thinking regex and the yesno regex
result_regex = thinking_regex + yesno_regex
```

This includes both a thinking block and the structured output at the end.

Regular expression compilation times can explode if you want to include things like thought controls that limit the thinking response, add allowed/disallowed words, or structured text to the thinking block. For example, if you force the thinking block to have a range of lengths between 10 and 50:

```python
thinking_regex = r'<think>((?:.|\n|\s){10,50}?)<\/think>(\s)*'
```

Compiling a single, smaller regular expression for _only_ the thinking block could take some load off of our compiler, and simplify the user interface for complicated and mixed forms of structure.

Ideally, I would prefer to be able to structure the thinking block and the output block separately, i.e. have separate logit processors for different sections of text. You could also jump to any number of different logit processors based on tokens you hit.

Consider the following interface, where I define a default pattern `pattern_1`. When a `b` token (or substring?) is hit, I want to switch to `pattern_2`.

```python
pattern_1 = r'.{10,50}'
pattern_2 = r'(cat|dog|squirrel)'

pattern_dict = {'<|bos|>':pattern_1, 'b':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)
```

or, in the case of DeepSeek R1 (roughly):

```python
pattern_1 = r'.*'
pattern_2 = r'(cat|dog|squirrel)'

# NOTE: may want to provide a default 
#             pattern that begins at the BOS token
pattern_dict = {'<think>':pattern_1, '</think>':pattern_2}

generator = outlines.generate.regex(model, pattern_dict)
```

I think it'd be a nice convenience feature, and it might cut down on compilation times for very complicated grammars. It would introduce some challenges in how to communicate patterns to inference servers like vLLM, which does not have a way to receive multiple patterns for constrained decoding.

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token-triggered processors #1407

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Token-triggered processors #1407

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions