-
Notifications
You must be signed in to change notification settings - Fork 65
Open
Labels
kind/bug 🐞Something isn't workingSomething isn't working
Description
Bug description
Hi.
I startet to use Podman as a replacement to docker. I realy love it.
I also discovered that there is the AI Lab to host models. So I started using it. Everything works, except the GPU support. When I create a service from a model it pull the image and start a container. When I then use the container to chat with a model it only uses the CPU.

I have followed the steps on https://podman-desktop.io/docs/podman/gpu and get this output:

But when I create a service from a model it doesn't use the gpu.
This also happens when I start the LLama stack and use it. It doesn't use the gpu.
Operating system
w11
Installation Method
from Podman-Desktop extension page
Version
1.2.x
Steps to reproduce
Just create a service from a model in the AI lab and try to use it.
Relevant log output
gml_cuda_init: failed to initialize CUDA: no CUDA-capable device is detected
build: 5985 (3f4fc97f) with cc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-7) for x86_64-redhat-linux
system info: n_threads = 8, n_threads_batch = 8, total_threads = 16
system_info: n_threads = 8 (n_threads_batch = 8) / 16 | CUDA : ARCHS = 500,610,700,750,800,860,890 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: binding port with default address family
main: HTTP server is listening, hostname: 0.0.0.0, port: 8000, http threads: 15
main: loading m
Additional context
No response
Metadata
Metadata
Assignees
Labels
kind/bug 🐞Something isn't workingSomething isn't working