-
Notifications
You must be signed in to change notification settings - Fork 11.5k
llama : add llama_batch_ext #11875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ngxson
wants to merge
61
commits into
ggml-org:master
Choose a base branch
from
ngxson:xsn/private_batch_api
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
llama : add llama_batch_ext #11875
Changes from all commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
4ed4fe7
first proposal for private llama_batch
ngxson f2e59a8
rework, targeting llama-server
ngxson 17d3658
move to llama_batch_ext
ngxson 85ef80c
server : use llama_batch_ext
ngxson aed4a8e
fix server
ngxson 4bf7ca3
llama_decode_ext
ngxson a1b1dea
Merge branch 'master' into xsn/private_batch_api
ngxson f0ffd81
adapt common
ngxson 9e75c49
Merge branch 'master' into xsn/private_batch_api
ngxson 40989f4
correct llama_decode_ext
ngxson 1170135
llama_batch_ext_add_text
ngxson 1d6ba97
remove token_info API
ngxson 46596ca
apply various in places
ngxson 17f954c
Merge branch 'master' into xsn/private_batch_api
ngxson 86973cb
fix merge errors
ngxson 4aabf4e
return output ID from llama_batch_ext_add/set
ngxson 47086fa
apply to the rest
ngxson 9fb2d81
fix common_batch missing seq_id
ngxson 65f0184
compile ok
ngxson c3dd790
fix llama_batch_ext_init_from_text
ngxson 04f8641
rm redundant llama_batch_ext_set_output_last
ngxson 54566ad
correct comment
ngxson bfdddbc
bring back mistakenly deleted llama_batch_init/free
ngxson 5e6a6d4
fix llama-run n_past
ngxson 3294036
fix gemma3-cli
ngxson 07d84fa
fix missing n_past in various places
ngxson ba79369
fix llama_batch_ext_init_from_embd
ngxson a363251
qwen2vl: use llama_batch_ext_set_pos
ngxson 8e7714f
fix compile
ngxson eaffba0
llama_batch_ext_ptr::from_text/embd
ngxson 116b9a1
rename to init_from_text
ngxson 624a683
fix compile
ngxson de788e0
Update examples/tts/tts.cpp
ngxson eab5606
Apply suggestions from code review
ngxson dc4bb64
Merge branch 'master' into xsn/private_batch_api
ngxson 7a3c178
speculative : adapt to new llama API
ggerganov 23d7407
Merge pull request #15 from ggml-org/xsn/private_batch_api
ngxson b0db7fc
android : adapt to new API
ggerganov 96ca6e8
swift : adapt to new API
ggerganov 32c2c41
android : fix permission
ngxson 6f54ee6
retrieval : avoid common_batch
ggerganov 8b80d68
embedding : avoid common_batch
ggerganov 76fd7d6
perplexity : avoid common_batch
ggerganov 8a23b4a
server : avoid common_batch
ggerganov b8b1732
server : remove old commented code [no ci]
ggerganov bd51d63
Merge pull request #16 from ggml-org/xsn/private_batch_api_pooling_none
ngxson 30f1db9
remove C API llama_batch_ext_init_from_text
ngxson c5a0176
Merge branch 'master' into xsn/private_batch_api
ngxson 2134cab
add cpp batch.add_text wrapper
ngxson 2cec1cf
move various places to batch.add_text
ngxson 3802ff2
add batch.clear() and batch.n_tokens()
ngxson e8827a6
Merge branch 'master' into xsn/private_batch_api
ngxson a9efdbb
qwen2vl: fix mrope position
ngxson 1434c2c
Merge branch 'master' into xsn/private_batch_api
ngxson d18a79e
llama_batch_ext_init with ctx
ngxson c4fea7f
fix qwzn2vl mrope position input
ngxson 42062cc
fix build
ngxson 56e82d0
fix server
ngxson 50fb396
server: fix batch_spec
ngxson 8ec0ff9
fix embeddings and retrieval
ngxson c1f4a78
correct output_id for llama-cpp header
ngxson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the next step along this refactoring is to remove all usages of
i_batch
from the examples. Thei_batch
is the index we use to extract logits from thei
th token in the batch, but this pattern is quite cumbersome and not very intuitive. In order to avoid this pattern, we have to introduce a new API call for sampling a token from a sequence:This should be enough to replace most or all usages of
i_batch
. We can do this in a next PR.