-
Notifications
You must be signed in to change notification settings - Fork 11.9k
lookahead-prompt : add example #4226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@apoorvumang FYI |
I just implemented this for tabby in https://github.com/TabbyML/tabby/pull/916/files - it's a slightly more complicated implementation (since tabby runs on continuous batching), but should be something can be used as reference. |
I'd love to give this a try, first time contributing. |
Somewhat related: https://arxiv.org/abs/2312.11462 It seems someone looked at lookup decoding (ngram) and speculative decoding and asked themselves: "Why not both?". I'm still reading through the paper. |
Add an example implementing the "Prompt Lookup Decoding" technique:
https://github.com/apoorvumang/prompt-lookup-decoding
This should be a great exercise for people looking to become familiar with
llama.cpp
's KV cache management and batched decoding API. Looking for contributions.The following examples can be used as starting points:
speculative
lookahead
batched
The text was updated successfully, but these errors were encountered: