-
Notifications
You must be signed in to change notification settings - Fork 216
USHIFT-5473: Doc for AI Model Serving #4673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pmtk: This pull request references USHIFT-5473 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
docs/user/ai_model_serving.md
Outdated
MicroShift is intended for the edge (small footprint, low resources), | ||
therefore kserve is configured to use a "Raw Deployment" mode which means that: | ||
- Kubernetes Deployments and Services will be created, and | ||
- neither Service Mesh (Istio) nor Serverless (Knative) are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a user doc, I'm not sure how to interpret this info, mostly the first bullet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I wrote this doc from a perspective of someone already knowing kserve so wanted to write out the differences, but maybe we need to show in simpler way how we intend it to be used on microshift
docs/user/ai_model_serving.md
Outdated
- neither Service Mesh (Istio) nor Serverless (Knative) are required. | ||
|
||
Additionally, automatic creation of the Ingress objects is disabled. If you want to expose your model outside the cluster, | ||
create a Route object instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably go with the above info in a more user-friendly way.
docs/user/ai_model_serving.md
Outdated
## Known issues | ||
- Because of [bug in kserve](https://github.com/kserve/kserve/pull/4274) | ||
([to be ported to RHOAI](https://issues.redhat.com/browse/RHOAIENG-21106)), | ||
rebooting a MicroShift host can result model server not coming back up if it was using ModelCar (model in OCI image). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rephrase it in more user-friendly way, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of comments! Everything your judgement, of course.
docs/user/ai_model_serving.md
Outdated
# AI Model Serving on MicroShift | ||
|
||
AI Model Serving on MicroShift is a platform for serving AI models. | ||
It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes. | |
It includes a limited subset of Red Hat OpenShift AI (RHOAI) components: [kserve](https://github.com/opendatahub-io/kserve) and `ServingRuntimes`. |
Looks like they are calling these "components", https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.17/html/installing_and_uninstalling_openshift_ai_self-managed_in_a_disconnected_environment/installing-the-single-model-serving-platform_component-install#About-the-single-model-serving-platform_component-install
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we supporting full kserve, or just kserve raw deployment? The initial feature epic talks about these as separate things. If it's just the RD mode, I wonder if you can just say that here and use the link,
Learn more about Raw deployment mode from RHOAI documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kserve has several modes of operation: serverless, model mesh, and raw deployment.
For microshift we opted for Raw Deployment mode because we don't need to install extra dependencies (istio and knative).
There's nothing preventing users from changing the kserve's settings and installing istio+knative, but from MicroShift's perspective I believe we only want to support RawDeployment? @DanielFroehlich wdyt?
|
||
Partial example output: | ||
``` | ||
# HELP ovms_requests_success Number of successful requests to a model or a DAG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mention these metrics are in a Prometheus format? Does Prometheus metrics format concept even exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a document about that: https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md
I guess they call it text-based format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed above
To obtain metrics of the model server simply make a request on `/metrics` endpoint:
to
To obtain Prometheus metrics of the model server simply make a request on `/metrics` endpoint:
AFAIK OTEL can work with prometheus metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, OTEL collector can expose metrics in a Prometheus format. Manually tried this yesterday.
ovms-resnet50-predictor ovms-resnet50-predictor-ai-demo.apps.example.com True ovms-resnet50-predictor | ||
``` | ||
|
||
### Querying the model server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a user example guide, it may be useful to mention the official python scripts from OpenVino Model Serving repository to check if the server and model are ready, get metrics and make inference requests: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you might be right.
This is much simpler but also raw. It might be better to reuse real openvino procedure. Or should we just provide a link to it?
I think it looks more favorable, but not sure if we want to include parts of that document or just link it...
Although, what we have here and in the CI isn't work of my imagination. I based it on OpenVino's kserve blog post: https://blog.openvino.ai/blog-posts/kserve-api (see "Run Inference via REST Interface with a JPEG File as Input Data using cURL")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my point of view, as a tech person, I prefer to see the curl
commands. To understand better how it's working under the hood.
Because this is just an example guide, I'd say we should only add the curl
command to the metrics and infer endpoint (happy path). And remove the rest of curl commands (to ready, live, etc) in favor of the python scripts from ovms official repo.
Either way, I don't have a strong opinion. It's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new subsection "Other Inference Protocol endpoints"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks.
docs/user/ai_model_serving.md
Outdated
- OpenVINO Model Server | ||
- vLLM ServingRuntime for KServe | ||
|
||
Following runtimes are included but not supported at the time: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean? We cannot separate those out from the image?
The question that requires explaining is: why do we include those if we do not support them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question.
So my idea for this is that we're actively testing OVMS and vLLM in CI. Rest of them don't have a test yet or maybe never will (I don't think we need to retest all of the runtimes if RHOAI tests them, right?)
Should we just join the two lists? Should emphasize somehow that we're testing these two runtimes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to emphasize what we're testing or not in the docs - it may change quickly.
If models are there, let's just mention them? If we specifically support (not test) selected models, that's important distinction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are tested and supported by RHOAI, so just like we don't retest every possible feature of other optional components, I think we can say these are supported even without tests (especially for "upstream" doc)
@pmtk: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
No description provided.