-
-
Notifications
You must be signed in to change notification settings - Fork 44
feat: add e2e test to verify service is avaliable #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/kind test |
Something wrong with golang ci lint? |
|
/kind feature |
This is because resp.Body.Close() will return a value, but we don't validate the return. We can silent the lint I think. Or we have to do something like: defer func() {
_ = resp.Body.Close()
}() |
| return nil | ||
| } | ||
|
|
||
| func CheckLlamacppServeAvaliable(localPort int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we have two different Check here? I think they're both OpenAI compatible, only one client makes more sense I think. Maybe we can use the https://github.com/openai/openai-go? The benefit is it supports more rich features like constructing the system prompt and chat conversation, or we have to build the structure ourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're both OpenAI compatible, only one client makes more sense I think.
IIUC, their API are different.
ollama: https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion
llamacpp: https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's support OpenAI API compatible chat completions first. It's a standard protocol across most of the inference engines.
I see the same code in other project like kubernetes/lws, but no lint error in CI. |
kerthcet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Thanks @nayihz
7c31097 to
c972cba
Compare
|
All comments addressed. @kerthcet Will it be rate-limited by huggingFace if we run ci many times? I suspect that most of the time is spent on pulling the model. |
kerthcet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one nit.
I think it's unrelated to this PR, right? We just check the service ready. |
|
em.. seems we used to run e2e tests with minutes. the last record. |
|
/retest all |
|
Once model is downloaded, loading the model into memory still takes time. |
|
I rerun the tests, it finished in 6mins. I think the time is acceptable. |
|
rerun again to see the final result. |
|
Seems stuck here, model is already downloaded. |
|
Have no idea whether is because of the portforward or the resource contention. May take a look later. |
| }() | ||
| <-readyChan | ||
| return check() | ||
| }).Should(gomega.Succeed()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not use eventually here, I don't think we want to forward the port for several time, eventually is used for status check, so we can wrap the check() function with eventually. Generally it looks like:
func ValidateServiceAvaliable() {
// port forward logic
select {
case <-readyChan:
case <-time.After(TIMEOUT):
return fmt.Errorf("port forwarding timeout")
}
Eventually ({check()}, TIMEOUT, INTERVAL)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. PTAL
|
/lgtm Let's make it happen now, we can focus on the performance later! Thanks @nayihz |
What this PR does / why we need it
Which issue(s) this PR fixes
Fixes ##274
Special notes for your reviewer
Does this PR introduce a user-facing change?