USHIFT-5473: Doc for AI Model Serving #4673

pmtk · 2025-03-13T15:21:53Z

No description provided.

openshift-ci-robot · 2025-03-13T15:21:58Z

@pmtk: This pull request references USHIFT-5473 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

pmtk · 2025-03-13T15:22:22Z

/cc @agullon @ggiguash @ShaunaDiaz

ggiguash · 2025-03-13T15:45:16Z

docs/user/ai_model_serving.md

+MicroShift is intended for the edge (small footprint, low resources),
+therefore kserve is configured to use a "Raw Deployment" mode which means that:
+- Kubernetes Deployments and Services will be created, and
+- neither Service Mesh (Istio) nor Serverless (Knative) are required.


If this is a user doc, I'm not sure how to interpret this info, mostly the first bullet.

Yeah. I wrote this doc from a perspective of someone already knowing kserve so wanted to write out the differences, but maybe we need to show in simpler way how we intend it to be used on microshift

ggiguash · 2025-03-13T15:46:13Z

docs/user/ai_model_serving.md

+- neither Service Mesh (Istio) nor Serverless (Knative) are required.
+
+Additionally, automatic creation of the Ingress objects is disabled. If you want to expose your model outside the cluster,
+create a Route object instead.


This should probably go with the above info in a more user-friendly way.

docs/user/ai_model_serving.md

ggiguash · 2025-03-13T15:48:37Z

docs/user/ai_model_serving.md

+## Known issues
+- Because of [bug in kserve](https://github.com/kserve/kserve/pull/4274)
+  ([to be ported to RHOAI](https://issues.redhat.com/browse/RHOAIENG-21106)),
+  rebooting a MicroShift host can result model server not coming back up if it was using ModelCar (model in OCI image).


Let's rephrase it in more user-friendly way, please.

docs/user/ai_model_serving.md

ShaunaDiaz

Lot of comments! Everything your judgement, of course.

ShaunaDiaz · 2025-03-13T15:51:20Z

docs/user/ai_model_serving.md

+# AI Model Serving on MicroShift
+
+AI Model Serving on MicroShift is a platform for serving AI models.
+It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes.


Suggested change

It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes.

It includes a limited subset of Red Hat OpenShift AI (RHOAI) components: [kserve](https://github.com/opendatahub-io/kserve) and `ServingRuntimes`.

Looks like they are calling these "components", https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.17/html/installing_and_uninstalling_openshift_ai_self-managed_in_a_disconnected_environment/installing-the-single-model-serving-platform_component-install#About-the-single-model-serving-platform_component-install

Are we supporting full kserve, or just kserve raw deployment? The initial feature epic talks about these as separate things. If it's just the RD mode, I wonder if you can just say that here and use the link,
Learn more about Raw deployment mode from RHOAI documentation.

Kserve has several modes of operation: serverless, model mesh, and raw deployment.

For microshift we opted for Raw Deployment mode because we don't need to install extra dependencies (istio and knative).

There's nothing preventing users from changing the kserve's settings and installing istio+knative, but from MicroShift's perspective I believe we only want to support RawDeployment? @DanielFroehlich wdyt?

docs/user/ai_model_serving.md

agullon · 2025-03-14T09:48:46Z

docs/user/ai_model_serving.md

+
+Partial example output:
+```
+# HELP ovms_requests_success Number of successful requests to a model or a DAG.


Should we mention these metrics are in a Prometheus format? Does Prometheus metrics format concept even exist?

There's a document about that: https://github.com/prometheus/docs/blob/main/content/docs/instrumenting/exposition_formats.md
I guess they call it text-based format?

Changed above

To obtain metrics of the model server simply make a request on `/metrics` endpoint:

to

To obtain Prometheus metrics of the model server simply make a request on `/metrics` endpoint:

AFAIK OTEL can work with prometheus metrics

Yes, OTEL collector can expose metrics in a Prometheus format. Manually tried this yesterday.

agullon · 2025-03-14T09:53:53Z

docs/user/ai_model_serving.md

+ovms-resnet50-predictor   ovms-resnet50-predictor-ai-demo.apps.example.com   True       ovms-resnet50-predictor
+```
+
+### Querying the model server


Because this is a user example guide, it may be useful to mention the official python scripts from OpenVino Model Serving repository to check if the server and model are ready, get metrics and make inference requests: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

Yeah, you might be right.
This is much simpler but also raw. It might be better to reuse real openvino procedure. Or should we just provide a link to it?

I think it looks more favorable, but not sure if we want to include parts of that document or just link it...

Although, what we have here and in the CI isn't work of my imagination. I based it on OpenVino's kserve blog post: https://blog.openvino.ai/blog-posts/kserve-api (see "Run Inference via REST Interface with a JPEG File as Input Data using cURL")

From my point of view, as a tech person, I prefer to see the curl commands. To understand better how it's working under the hood.
Because this is just an example guide, I'd say we should only add the curl command to the metrics and infer endpoint (happy path). And remove the rest of curl commands (to ready, live, etc) in favor of the python scripts from ovms official repo.
Either way, I don't have a strong opinion. It's up to you.

I'd rather not defer to OVMS example for the metadata, ready, or metrics, because the endpoints are actually "coming" from kserve.

See v1 and v2 protocols.

Added a new subsection "Other Inference Protocol endpoints"

Great, thanks.

ggiguash · 2025-03-20T14:40:41Z

docs/user/ai_model_serving.md

+- OpenVINO Model Server
+- vLLM ServingRuntime for KServe
+
+Following runtimes are included but not supported at the time:


What does it mean? We cannot separate those out from the image?
The question that requires explaining is: why do we include those if we do not support them?

That's a good question.
So my idea for this is that we're actively testing OVMS and vLLM in CI. Rest of them don't have a test yet or maybe never will (I don't think we need to retest all of the runtimes if RHOAI tests them, right?)

Should we just join the two lists? Should emphasize somehow that we're testing these two runtimes?

I'm not sure we need to emphasize what we're testing or not in the docs - it may change quickly.
If models are there, let's just mention them? If we specifically support (not test) selected models, that's important distinction.

These are tested and supported by RHOAI, so just like we don't retest every possible feature of other optional components, I think we can say these are supported even without tests (especially for "upstream" doc)

docs/user/ai_model_serving.md

openshift-ci · 2025-03-21T10:20:41Z

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

docs/user/ai_model_serving.md

ggiguash · 2025-03-21T15:46:34Z

/lgtm

openshift-ci · 2025-03-21T15:47:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ggiguash,pmtk]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Dev doc for AI Model Serving

4e55db5

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 13, 2025

openshift-ci bot requested review from agullon, ggiguash, ShaunaDiaz, copejon and vanhalenar March 13, 2025 15:22

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2025

ggiguash reviewed Mar 13, 2025

View reviewed changes

ShaunaDiaz reviewed Mar 13, 2025

View reviewed changes

agullon reviewed Mar 14, 2025

View reviewed changes

pmtk added 3 commits March 14, 2025 11:31

Review fixes

c6bf81d

Move Limitations and Known Issue to bottom

3c9ced0

More review changes

79df7eb

pmtk changed the title ~~USHIFT-5473: Dev doc for AI Model Serving~~ USHIFT-5473: Doc for AI Model Serving Mar 17, 2025