-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Add equivalent to hf apply_chat_template() #5527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ggerganov May need for feedback on this subject. Thanks! |
Great - thanks for initiating! Here are some suggestions:
// llama_chat_message
// llama_chat_apply_template fits better in our naming convention
LLAMA_API int32_t llama_chat_apply_template(
llama_chat_message * msg,
size_t n_msg,
char * template,
bool add_ass,
char * buf,
int32_t length); |
Thanks for the feedback. Initiall, I thought that the // By setting template to nullptr, we use the template embedded inside model
// Most developers will just do like this:
llama_chat_apply_template(model, msg, n_msg, nullptr,...) Without that // Now the developer need to read the template from the model themself:
std::string current_template;
current_template.resize(1024);
std::string template_key = "tokenizer.chat_template";
int32_t res = llama_model_meta_val_str(model, template_key.c_str(), current_template.data(), current_template.size());
if (res < 0) {
// Error: the model does not have a template, maybe we need to use a default one
}
// then finally use it
llama_chat_apply_template(current_template, msg, n_msg,...) But I understand that maybe you want each function of the llama.cpp library to be "elementary". So can you please confirm, do you still prefer not having this The tokenize option is actually a future-proof thing. The original implementation of hf However, due to the way people write templates nowadays, that So yeah you're right, we don't need |
Ok, makes sense. In that case, lets put the the // both "model" and "template" are optional, but at least one is required
// "template" has higher precedence than "model"
LLAMA_API int32_t llama_chat_apply_template(
const struct llama_model * model,
const char * template,
const struct llama_chat_message * msg,
size_t n_msg,
bool add_ass,
char * buf,
int32_t length); |
Motivation
Add described in #5447 , we can add an equivalent of huggingface's
apply_chat_template()
that use simple heuristic checks to format the chat into string. In other word, there is no jinja parser being used in our implementation.Docs for hf's apply_chat_template: https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer.apply_chat_template
Supported templated
This section is moved to wiki: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template
Initial proposal for llama_chat_apply_template (outdated)
The text was updated successfully, but these errors were encountered: