Skip to content

tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos #11616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 3, 2025

Conversation

ochafik
Copy link
Collaborator

@ochafik ochafik commented Feb 3, 2025

Couple of fixes in common_chat_templates_from_model

  • allow --chat-template chatml when --jinja enabled:
    • Cheap way to force generic tool call format + crude template onto models (doesn't bode well w/ gemma, which thinks of itself as a model, not an assistant; testing other models w/ it in slow tool call server tests).
  • catch any exceptions in jinja parsing (e.g. Eval bug: Release b4524 breaks serving of granite-code models #11500) and default to chatml
  • Incidentally, avoid double BOS issue w/ jinja (just pass empty bos/eos tokens to the template)

@github-actions github-actions bot added examples python python script changes server labels Feb 3, 2025
@ochafik ochafik changed the title tool-call: allow --chat-template chatml when --jinja enabled tool-call: allow --chat-template chatml w/ --jinja, default to chatml upon parsing issue, avoid double bos Feb 3, 2025
@ochafik ochafik marked this pull request as ready for review February 3, 2025 14:01
@ochafik ochafik requested a review from ngxson as a code owner February 3, 2025 14:01
@ochafik
Copy link
Collaborator Author

ochafik commented Feb 3, 2025

Sorry somehow had forgotten half of the changes when I undrafted, should look better now.

LOG_ERR("%s: failed to parse chat template: %s\n", __func__, e.what());
return {
has_explicit_template,
std::make_unique<minja::chat_template>(CHATML_TEMPLATE_SRC, token_bos, token_eos),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we should no longer fallback to chatml. The fallback to chatml was a temporary solution when chat templates was not a common thing.

For example, in such case, we can return an error message like: Chat template is not supported, you must specify a custom template using --chat-template ... when user uses /chat/completions endpoint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way, it's surprising all the things we can have chatml do with a few "polyfills" (in minja)

@ochafik ochafik merged commit cde3833 into ggml-org:master Feb 3, 2025
48 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
…chatml upon parsing issue, avoid double bos (ggml-org#11616)

* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
…chatml upon parsing issue, avoid double bos (ggml-org#11616)

* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
…chatml upon parsing issue, avoid double bos (ggml-org#11616)

* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
…chatml upon parsing issue, avoid double bos (ggml-org#11616)

* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples python python script changes server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants