Skip to content

Conversation

pmtk
Copy link
Member

@pmtk pmtk commented Mar 13, 2025

No description provided.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 13, 2025

@pmtk: This pull request references USHIFT-5473 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 13, 2025
@pmtk
Copy link
Member Author

pmtk commented Mar 13, 2025

/cc @agullon @ggiguash @ShaunaDiaz

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 13, 2025
MicroShift is intended for the edge (small footprint, low resources),
therefore kserve is configured to use a "Raw Deployment" mode which means that:
- Kubernetes Deployments and Services will be created, and
- neither Service Mesh (Istio) nor Serverless (Knative) are required.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a user doc, I'm not sure how to interpret this info, mostly the first bullet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I wrote this doc from a perspective of someone already knowing kserve so wanted to write out the differences, but maybe we need to show in simpler way how we intend it to be used on microshift

- neither Service Mesh (Istio) nor Serverless (Knative) are required.

Additionally, automatic creation of the Ingress objects is disabled. If you want to expose your model outside the cluster,
create a Route object instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go with the above info in a more user-friendly way.

## Known issues
- Because of [bug in kserve](https://github.com/kserve/kserve/pull/4274)
([to be ported to RHOAI](https://issues.redhat.com/browse/RHOAIENG-21106)),
rebooting a MicroShift host can result model server not coming back up if it was using ModelCar (model in OCI image).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rephrase it in more user-friendly way, please.

Copy link

@ShaunaDiaz ShaunaDiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of comments! Everything your judgement, of course.

# AI Model Serving on MicroShift

AI Model Serving on MicroShift is a platform for serving AI models.
It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It includes limited subset of Red Hat OpenShift AI (RHOAI): [kserve](https://github.com/opendatahub-io/kserve) and ServingRuntimes.
It includes a limited subset of Red Hat OpenShift AI (RHOAI) components: [kserve](https://github.com/opendatahub-io/kserve) and `ServingRuntimes`.

Looks like they are calling these "components", https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.17/html/installing_and_uninstalling_openshift_ai_self-managed_in_a_disconnected_environment/installing-the-single-model-serving-platform_component-install#About-the-single-model-serving-platform_component-install

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supporting full kserve, or just kserve raw deployment? The initial feature epic talks about these as separate things. If it's just the RD mode, I wonder if you can just say that here and use the link,
Learn more about Raw deployment mode from RHOAI documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kserve has several modes of operation: serverless, model mesh, and raw deployment.

For microshift we opted for Raw Deployment mode because we don't need to install extra dependencies (istio and knative).

There's nothing preventing users from changing the kserve's settings and installing istio+knative, but from MicroShift's perspective I believe we only want to support RawDeployment? @DanielFroehlich wdyt?


Partial example output:
```
# HELP ovms_requests_success Number of successful requests to a model or a DAG.
Copy link
Contributor

@agullon agullon Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention these metrics are in a Prometheus format? Does Prometheus metrics format concept even exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed above

To obtain metrics of the model server simply make a request on `/metrics` endpoint:

to

To obtain Prometheus metrics of the model server simply make a request on `/metrics` endpoint:

AFAIK OTEL can work with prometheus metrics

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, OTEL collector can expose metrics in a Prometheus format. Manually tried this yesterday.

ovms-resnet50-predictor ovms-resnet50-predictor-ai-demo.apps.example.com True ovms-resnet50-predictor
```

### Querying the model server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a user example guide, it may be useful to mention the official python scripts from OpenVino Model Serving repository to check if the server and model are ready, get metrics and make inference requests: https://github.com/openvinotoolkit/model_server/blob/main/client/python/kserve-api/samples/README.md

Copy link
Member Author

@pmtk pmtk Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you might be right.
This is much simpler but also raw. It might be better to reuse real openvino procedure. Or should we just provide a link to it?

I think it looks more favorable, but not sure if we want to include parts of that document or just link it...

Although, what we have here and in the CI isn't work of my imagination. I based it on OpenVino's kserve blog post: https://blog.openvino.ai/blog-posts/kserve-api (see "Run Inference via REST Interface with a JPEG File as Input Data using cURL")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view, as a tech person, I prefer to see the curl commands. To understand better how it's working under the hood.
Because this is just an example guide, I'd say we should only add the curl command to the metrics and infer endpoint (happy path). And remove the rest of curl commands (to ready, live, etc) in favor of the python scripts from ovms official repo.
Either way, I don't have a strong opinion. It's up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not defer to OVMS example for the metadata, ready, or metrics, because the endpoints are actually "coming" from kserve.

See v1 and v2 protocols.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new subsection "Other Inference Protocol endpoints"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks.

@pmtk pmtk changed the title USHIFT-5473: Dev doc for AI Model Serving USHIFT-5473: Doc for AI Model Serving Mar 17, 2025
- OpenVINO Model Server
- vLLM ServingRuntime for KServe

Following runtimes are included but not supported at the time:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean? We cannot separate those out from the image?
The question that requires explaining is: why do we include those if we do not support them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question.
So my idea for this is that we're actively testing OVMS and vLLM in CI. Rest of them don't have a test yet or maybe never will (I don't think we need to retest all of the runtimes if RHOAI tests them, right?)

Should we just join the two lists? Should emphasize somehow that we're testing these two runtimes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to emphasize what we're testing or not in the docs - it may change quickly.
If models are there, let's just mention them? If we specifically support (not test) selected models, that's important distinction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are tested and supported by RHOAI, so just like we don't retest every possible feature of other optional components, I think we can say these are supported even without tests (especially for "upstream" doc)

Copy link
Contributor

openshift-ci bot commented Mar 21, 2025

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ggiguash
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 21, 2025
Copy link
Contributor

openshift-ci bot commented Mar 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 35a675c into openshift:main Mar 21, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants