Skip to content

Conversation

@oobabooga
Copy link
Contributor

Currently, the llama_eval_internal function requires the first token in the tokens array to be a BOS token (=1).

I believe that this is not necessary, as

  1. Intentionally removing the BOS token can make generations more creative. With the BOS token, the prompt is associated to text at the beginning of a new document in the training dataset. Without it, the prompt can be associated to text at any location.

In other words, the BOS token adds a "beginning of document" bias that can be optionally removed.

  1. This is not desirable while evaluating the model perplexity, since in most cases the sequence of ids will be mid-document.

I originally encountered the "first token must be BOS" error while trying to evaluate llama.cpp using a transformers wrapper that I am working on here. The evaluation fails because the first token in the sequence provided by my code, which is based on this tutorial, is not BOS.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this check since the initial batch of OpenLLaMA models suffered significantly if the first token was not BOS.
I haven't checked what happened later - have some vague memory that this issue was resolved with later releases, so it is probably not needed any more.

@ggerganov ggerganov merged commit 1d16309 into ggml-org:master Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants