[Bug] Strix Halo APU: llama.cpp ROCm runtime does not recognize shared/unified memory when BIOS GPU allocation is set to Auto

## Environment

| | |
|---|---|
| **OS** | Windows 11 Pro (26200) |
| **Hardware** | AMD Strix Halo APU |
| **LM Studio** | 0.4.17 beta Build 3 |
| **llama.cpp runtime** | 2.22.1 (ROCm) — bundled with 0.4.17 Build 2 |
| **BIOS GPU allocation** | **Auto** (dynamic — no fixed dedicated VRAM slice) |
| **Total system RAM** | 32 GB |

---

## Description

On AMD Strix Halo APUs, the BIOS offers two GPU memory allocation modes:

- **Fixed** — reserves a static VRAM slice (e.g. 8 GB or 16 GB) visible as dedicated VRAM to the OS
- **Auto** — allocates GPU memory dynamically from the unified system RAM pool at runtime

When BIOS is set to **Auto**, the GPU has little-to-no fixed dedicated VRAM reported to the OS. Instead, memory is allocated from the shared/GTT pool on demand. This is the expected and recommended setting for maximising available memory on a unified-memory APU.

**The llama.cpp ROCm runtime does not correctly enumerate total available VRAM in this configuration.** It either:

1. Surveys a near-zero dedicated VRAM figure and skips GPU offloading entirely, or
2. Triggers the "no-GPUs" stuck state (partially addressed in 0.4.17 Build 3 for old ROCm versions, but the root cause here is a BIOS allocation mode, not ROCm version)

The GPU is enumerated correctly (LM Studio shows the iGPU after Build 2), but the reported VRAM budget reflects only the tiny dedicated slice rather than the ~32 GB of shared pool actually accessible via ROCm's GTT/unified memory path.

---

## Steps to Reproduce

1. AMD Strix Halo machine, BIOS set to **Auto** GPU allocation (not a fixed reserved size)
2. Install LM Studio 0.4.17 beta Build 2 or 3
3. Load any model ≥ 8 GB
4. Observe: LM Studio either shows very low VRAM available, fails to offload layers, or shows no GPU at all despite the iGPU appearing in the device list

---

## Expected Behavior

llama.cpp ROCm runtime should query the full available memory pool (dedicated + GTT/shared system RAM) and use it for layer offloading, matching actual hardware capability (~32 GB on a 32 GB system in Auto mode).

---

## Actual Behavior

Runtime reports only the small dedicated VRAM slice (or near zero), preventing meaningful GPU offloading even though the unified pool has tens of GBs available.

---

## Workaround

Omitting the `--gpu` flag from `lms` CLI calls allows llama.cpp's internal auto-fit to bypass the explicit VRAM budget and spill layers into the unified pool. This works but is not surfaced in the LM Studio UI — there is no way to trigger this behaviour through the app's model load settings.

---

## Additional Context

- Build 3 release note says: *"Fix bug where survey failures due to old ROCm versions led to a stuck state of no-GPUs"* — the symptom (survey failure → stuck no-GPUs) is identical, but the trigger here is BIOS Auto allocation, not ROCm version. It may be the same survey code path.
- Related open issue for Linux Strix Halo: #494 (different root cause — exit code null on model load — but same hardware class)
- On Linux this is typically worked around with `ttm.pages_limit` GRUB param to expose GTT pool size; a Windows-equivalent fix would likely be at the ROCm VRAM survey level in the llama.cpp runtime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Strix Halo APU: llama.cpp ROCm runtime does not recognize shared/unified memory when BIOS GPU allocation is set to Auto #589

Environment

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development


OS	Windows 11 Pro (26200)
Hardware	AMD Strix Halo APU
LM Studio	0.4.17 beta Build 3
llama.cpp runtime	2.22.1 (ROCm) — bundled with 0.4.17 Build 2
BIOS GPU allocation	Auto (dynamic — no fixed dedicated VRAM slice)
Total system RAM	32 GB

Uh oh!

[Bug] Strix Halo APU: llama.cpp ROCm runtime does not recognize shared/unified memory when BIOS GPU allocation is set to Auto #589

Description

Environment

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Workaround

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions