Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache)

### What happened?

The llama.cpp tokenizer for Phi-3 has odd behavior, where re-tokenizing the same text over and over keeps adding whitespaces to the first non-BOS token. This has several issues:

1. It doesn't match the original tokenizer behavior from Huggingface `Transformers`
2. Re-processing the same text causes the kv-cache to be invalidated, forcing another prompt fill of all the input tokens.

I maintain the Guidance library (https://github.com/guidance-ai/guidance), where we often need to re-tokenize inputs after adding any templated/deterministic text from the user. This is causing a significant performance regression in Phi-3 usage via llama.cpp on guidance whenever we go through this cycle :(. I believe pretty much all constrained generation libraries would likely affected by this too.

Here's an example of the bug in action (using the llama-cpp-python bindings, which are very thin wrappers on the tokenizer)

The model I'm using: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Phi-3-mini-4k-instruct-q4.gguf

```python
import llama_cpp
print(llama_cpp.__version__) # '0.2.78' -- filing here because it seems like it's a lower level bug in llama.cpp

model = llama_cpp.Llama(model_path="Phi-3-mini-4k-instruct-q4.gguf", logits_all=True)
tokenizer = llama_cpp.LlamaTokenizer(model)

test_str = "Hi I am a hippo"
test_tokens = tokenizer.tokenize(test_str.encode("utf-8")) # [1, 6324, 306, 626, 263, 7251, 9759]

retokenized = b''.join([tokenizer.detokenize([i]) for i in test_tokens]) # b' Hi I am a hippo'
retokenized_tokens = tokenizer.tokenize(retokenized) # [1, 29871, 6324, 306, 626, 263, 7251, 9759]

retokenized2 = b''.join([tokenizer.detokenize([i]) for i in retokenized_tokens]) # b'  Hi I am a hippo'
```

Note how the token at index `1` has a continually growing whitespace when going through the tokenize/detokenize cycle. Repeating this process continuously increases the whitespace (

`"   Hi I am a hippo"` -> 
`"    Hi I am a hippo"` -> 
`"     Hi I am a hippo"` ->
`"      Hi I am a hippo"` ...

This is the heart of the issue, and doesn't happen with the original tokenizer implementation in Transformers. 


### Name and Version

llama-cpp-python is using this commit for their latest release: ggerganov/llama.cpp@fd5ea0f897ecb3659d6c269ef6f3d833e865ead7

### What operating system are you seeing the problem on?

Linux, Mac, Windows

### Relevant log output

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache) #7938

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: Phi-3 Tokenizer Adds Whitespaces on re-tokenization (which invalidates KV-cache) #7938

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions