-
Notifications
You must be signed in to change notification settings - Fork 11.5k
Server allow /completion and /embedding #3815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is a regression, it still worked in b1359. |
It looks like the issue was introduced in this commit: 438c2ca which references this pull request #3677 Before the changePreviously, the const json data = format_embedding_response(llama);
return res.set_content(data.dump(), "application/json"); }); Which went through to: static json format_embedding_response(llama_server_context &llama)
{
return json{
{"embedding", llama.getEmbedding()},
};
} Which went to: std::vector<float> getEmbedding()
{
static const int n_embd = llama_n_embd(model);
if (!params.embedding)
{
LOG_WARNING("embedding disabled", {
{"params.embedding", params.embedding},
});
return std::vector<float>(n_embd, 0.0f);
}
const float *data = llama_get_embeddings(ctx);
std::vector<float> embedding(data, data + n_embd);
return embedding;
}
}; That's what I'd expect, since it matches the behaviour of the command line in calling After the commitWhat it appears to be doing is adding a task to a queue by calling The task now accidentally makes it so that the server always responds with embeddings if the server is started with the
What I think should have happened is that:
Possible accidental removal of feature?It looks to me that the feature was accidentally removed in the refactor. |
Yes, very likely it was accidentally removed. |
Thanks, the Nix flake made it very easy to get a development environment and build setup together. |
Where can I see what embedding model is being used? or how can I pass a different embedding model (different from chat completion one)? |
The server only supports one model. It is used both for embeddings and completions. If you need two different models, you need two servers. Feel free to open an enhancement issue if you would like to have different models for embeddings and completions |
Hi, im running 01245f5 right now and I get the error
when starting |
Looks like an accident, based on reading the commit where it was changed: c3ebcfa#diff-87355a1a297a9f0fdc86af5e2a59cae153290f58d68822cd10c30fee4f7f7076L2005 Looks like it's supposed to check that all the requests in a batch are either embedding or completion, not to prevent use of embeddings and completion within the same server process. I haven't looked into it more than that, but it doesn't look right at first glance. |
Although, reading #8420 - it looks like the workflow has changed, and that to enable embedding and completion, you must now omit the Reading through the rest of the comments, seems like the docs are being updated by @okigan |
Thanks @a-h! You're right, the embeddings endpoint does work without |
Could anyone tell me how to enable embedding AND completion at the same time in current code base ?
|
@edwin0cheng After #10135, without the |
I just tested it and it works now, thank you. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
When I start the server as follows:
and make a request to
/embedding
I get back - as expected the vector of embeddings. Now If I make a request to
/completion
as follows:I'd expect the normal completion to still work. But all I get is the embedding of the prompt (I tested it with above examples, and it is the same vector returned in both examples)
Motivation
I guess having both normal completion and the possibiilty to just get embeddings makes sense in a lot of applications with the server.
The text was updated successfully, but these errors were encountered: