Add equivalent to hf apply_chat_template()

# Motivation

Add described in https://github.com/ggerganov/llama.cpp/issues/5447 , we can add an equivalent of huggingface's `apply_chat_template()` that use simple heuristic checks to format the chat into string. In other word, there is **no jinja parser** being used in our implementation.

Docs for hf's apply_chat_template: https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.apply_chat_template

# Supported templated

This section is moved to wiki: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

<details>
<summary>Initial proposal for llama_chat_apply_template (outdated)</summary>

```cpp
    // used in chat template
    typedef struct llama_chat_message {
        char * role; // NOTE: chatml actually allow roles other than system, user and assistant. therefore, no enum here
        char * content;
    } llama_chat_message;

    /// @details Apply chat template and maybe tokenize it. Inspired by hf apply_chat_template() on python.
    /// @param conversation a list of multiple llama_chat_message
    /// @param template A Jinja template to use for this conversion. If this is nullptr, the model’s default chat template will be used instead.
    /// @param tokenize Whether to tokenize the output. If False, the output will be a string.
    /// @param add_generation_prompt Whether to end the prompt with the token(s) that indicate the start of an assistant message.
    /// @return If "tokenize" is set to false, the "buf" must be a string (returned value will be the string length).
    ///         Otherwise, "buf" must be a list of tokens (returned value will be the number of tokens).
    LLAMA_API int32_t llama_apply_chat_template(
              const struct llama_model * model,
                    llama_chat_message * conversation,
                                size_t   message_count,
                                  char * template,
                                  bool   tokenize,
                                  bool   add_generation_prompt,
                                  char * buf,
                               int32_t   length);
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add equivalent to hf apply_chat_template() #5527

Motivation

Supported templated

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add equivalent to hf apply_chat_template() #5527

Description

Motivation

Supported templated

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions