-
Notifications
You must be signed in to change notification settings - Fork 216
USHIFT-5339: Test AI Model Serving with NVIDIA GPU #4659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USHIFT-5339: Test AI Model Serving with NVIDIA GPU #4659
Conversation
@pmtk: This pull request references USHIFT-5339 which is a valid jira issue. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
fddec8e
to
238e19e
Compare
/retest |
|
||
set -xeuo pipefail | ||
|
||
sudo reboot now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure reboot accepts arguments. I think just sudo reboot
would be enough.
--enable=codeready-builder-for-rhel-9-x86_64-rpms | ||
|
||
sudo dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm | ||
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo | |
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -m)/cuda-rhel9.repo |
capacity=$(oc get node -o json | jq -r '.items[0].status.capacity') | ||
gpus=$(echo "${capacity}" | jq -r '."nvidia.com/gpu"') | ||
|
||
if [[ "${gpus}" != "1" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check that it's not empty? I mean there may be more than one GPUs on the node.
|
||
# Get the logs | ||
pod=$(oc get pods -n cuda-test --selector=batch.kubernetes.io/job-name=test-cuda-vector-add --output=jsonpath='{.items[*].metadata.name}') | ||
logs=$(oc logs -n cuda-test "${pod}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit wary of storing entire log outputs in a variable. Should we use a tmp file instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only like 6 lines, but sure we can do it
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
oc delete job -n cuda-test test-cuda-vector-add | ||
oc delete ns cuda-test | ||
|
||
if ! echo "${logs}" | grep -q PASSED; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it more precise, i.e. ^PASSED or like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the output.
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Would ^Test PASSED$
be better?
"model": "granite-3b-code-base-2k", | ||
"prompt": "Once upon a time,", | ||
"max_tokens": 256, | ||
"temperature": 0.5}' | jq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check the answer is not empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea for double checking it
RUN pip install huggingface-hub | ||
|
||
# Download the model file from hugging face | ||
WORKDIR /tmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the workdir change necessary? The model seems to be downloaded to /model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadolint complaint:
./scripts/ci-ai-model-serving/tests/vllm-image/Containerfile:13 DL3045 warning: `COPY` to a relative destination without `WORKDIR` set.
So it was either WORKDIR /tmp
or
COPY download_model.py /tmp
RUN python /tmp/download_model.py
but I figured that WORKDIR
would work without retesting (laziness).
Do you prefer adding /tmp
to COPY and RUN?
238e19e
to
ff70ff8
Compare
ff70ff8
to
f4ab69f
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ggiguash, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@pmtk: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.