Skip to content

examples : switch retrieval to llama_encode #13685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 21, 2025
Merged

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented May 21, 2025

Also enable --no-warmup option for retrieval.

Warmup calls llama_decode, would it make sense to disable this for embedding models somehow?

@CISC CISC requested a review from ggerganov May 21, 2025 13:17
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warmup calls llama_decode, would it make sense to disable this for embedding models somehow?

There isn't a simple criteria to do it. Is it causing issues?

@CISC
Copy link
Collaborator Author

CISC commented May 21, 2025

Warmup calls llama_decode, would it make sense to disable this for embedding models somehow?

There isn't a simple criteria to do it. Is it causing issues?

No, just the decode: cannot decode batches with this context (use llama_encode() instead) logging.

@CISC CISC merged commit 2aa777d into master May 21, 2025
46 checks passed
@CISC CISC deleted the cisc/retrieval-encode branch May 21, 2025 14:57
infil00p pushed a commit to baseweight/llama.cpp that referenced this pull request May 22, 2025
* switch retrieval to llama_encode

* enable --no-warmup for retrieval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants