Skip to content

Add equivalent to hf apply_chat_template() #5527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ngxson opened this issue Feb 16, 2024 · 4 comments · Fixed by #5538
Closed

Add equivalent to hf apply_chat_template() #5527

ngxson opened this issue Feb 16, 2024 · 4 comments · Fixed by #5538
Labels
enhancement New feature or request

Comments

@ngxson
Copy link
Collaborator

ngxson commented Feb 16, 2024

Motivation

Add described in #5447 , we can add an equivalent of huggingface's apply_chat_template() that use simple heuristic checks to format the chat into string. In other word, there is no jinja parser being used in our implementation.

Docs for hf's apply_chat_template: https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.apply_chat_template

Supported templated

This section is moved to wiki: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template

Initial proposal for llama_chat_apply_template (outdated)
    // used in chat template
    typedef struct llama_chat_message {
        char * role; // NOTE: chatml actually allow roles other than system, user and assistant. therefore, no enum here
        char * content;
    } llama_chat_message;

    /// @details Apply chat template and maybe tokenize it. Inspired by hf apply_chat_template() on python.
    /// @param conversation a list of multiple llama_chat_message
    /// @param template A Jinja template to use for this conversion. If this is nullptr, the model’s default chat template will be used instead.
    /// @param tokenize Whether to tokenize the output. If False, the output will be a string.
    /// @param add_generation_prompt Whether to end the prompt with the token(s) that indicate the start of an assistant message.
    /// @return If "tokenize" is set to false, the "buf" must be a string (returned value will be the string length).
    ///         Otherwise, "buf" must be a list of tokens (returned value will be the number of tokens).
    LLAMA_API int32_t llama_apply_chat_template(
              const struct llama_model * model,
                    llama_chat_message * conversation,
                                size_t   message_count,
                                  char * template,
                                  bool   tokenize,
                                  bool   add_generation_prompt,
                                  char * buf,
                               int32_t   length);
@ngxson ngxson added the enhancement New feature or request label Feb 16, 2024
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 16, 2024

@ggerganov May need for feedback on this subject. Thanks!

@ggerganov
Copy link
Member

Great - thanks for initiating!

Here are some suggestions:

  • No need to support tokenization in this function - the user can always tokenize if needed
  • model arg is not necessary
  • Rename the function to llama_chat_apply_template
  • Shorten add_generation_prompt to add_ass
    // llama_chat_message
    // llama_chat_apply_template fits better in our naming convention
    LLAMA_API int32_t llama_chat_apply_template(
                    llama_chat_message * msg,
                                size_t   n_msg,
                                  char * template,
                                  bool   add_ass,
                                  char * buf,
                               int32_t   length);

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 16, 2024

Thanks for the feedback.

Initiall, I thought that the model arg is to simplify the usage of this function:

// By setting template to nullptr, we use the template embedded inside model
// Most developers will just do like this:
llama_chat_apply_template(model, msg, n_msg, nullptr,...)

Without that model arg, the usage will become:

// Now the developer need to read the template from the model themself:
std::string current_template;
current_template.resize(1024);
std::string template_key = "tokenizer.chat_template";
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), current_template.data(), current_template.size());
if (res < 0) {
    // Error: the model does not have a template, maybe we need to use a default one
}
// then finally use it
llama_chat_apply_template(current_template, msg, n_msg,...)

But I understand that maybe you want each function of the llama.cpp library to be "elementary". So can you please confirm, do you still prefer not having this model arg, or you want to keep it?


The tokenize option is actually a future-proof thing. The original implementation of hf apply_chat_template() has that option, is because the message content may contain special tokens.

However, due to the way people write templates nowadays, that tokenize option is useless. HF apply_chat_template() also does not returns the attention_mask, which is kind of a pain.

So yeah you're right, we don't need tokenize option for now. Should wait until the world fix that then we will decide later.

@ggerganov
Copy link
Member

ggerganov commented Feb 16, 2024

Ok, makes sense. In that case, lets put the the model and template args at the start:

    // both "model" and "template" are optional, but at least one is required
    // "template" has higher precedence than "model"
    LLAMA_API int32_t llama_chat_apply_template(
              const struct llama_model * model,
                            const char * template,
       const struct llama_chat_message * msg,
                                size_t   n_msg,
                                  bool   add_ass,
                                  char * buf,
                               int32_t   length);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants