You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On AMD Strix Halo APUs, the BIOS offers two GPU memory allocation modes:
Fixed — reserves a static VRAM slice (e.g. 8 GB or 16 GB) visible as dedicated VRAM to the OS
Auto — allocates GPU memory dynamically from the unified system RAM pool at runtime
When BIOS is set to Auto, the GPU has little-to-no fixed dedicated VRAM reported to the OS. Instead, memory is allocated from the shared/GTT pool on demand. This is the expected and recommended setting for maximising available memory on a unified-memory APU.
The llama.cpp ROCm runtime does not correctly enumerate total available VRAM in this configuration. It either:
Surveys a near-zero dedicated VRAM figure and skips GPU offloading entirely, or
Triggers the "no-GPUs" stuck state (partially addressed in 0.4.17 Build 3 for old ROCm versions, but the root cause here is a BIOS allocation mode, not ROCm version)
The GPU is enumerated correctly (LM Studio shows the iGPU after Build 2), but the reported VRAM budget reflects only the tiny dedicated slice rather than the ~32 GB of shared pool actually accessible via ROCm's GTT/unified memory path.
Steps to Reproduce
AMD Strix Halo machine, BIOS set to Auto GPU allocation (not a fixed reserved size)
Install LM Studio 0.4.17 beta Build 2 or 3
Load any model ≥ 8 GB
Observe: LM Studio either shows very low VRAM available, fails to offload layers, or shows no GPU at all despite the iGPU appearing in the device list
Expected Behavior
llama.cpp ROCm runtime should query the full available memory pool (dedicated + GTT/shared system RAM) and use it for layer offloading, matching actual hardware capability (~32 GB on a 32 GB system in Auto mode).
Actual Behavior
Runtime reports only the small dedicated VRAM slice (or near zero), preventing meaningful GPU offloading even though the unified pool has tens of GBs available.
Workaround
Omitting the --gpu flag from lms CLI calls allows llama.cpp's internal auto-fit to bypass the explicit VRAM budget and spill layers into the unified pool. This works but is not surfaced in the LM Studio UI — there is no way to trigger this behaviour through the app's model load settings.
Additional Context
Build 3 release note says: "Fix bug where survey failures due to old ROCm versions led to a stuck state of no-GPUs" — the symptom (survey failure → stuck no-GPUs) is identical, but the trigger here is BIOS Auto allocation, not ROCm version. It may be the same survey code path.
On Linux this is typically worked around with ttm.pages_limit GRUB param to expose GTT pool size; a Windows-equivalent fix would likely be at the ROCm VRAM survey level in the llama.cpp runtime.
Environment
Description
On AMD Strix Halo APUs, the BIOS offers two GPU memory allocation modes:
When BIOS is set to Auto, the GPU has little-to-no fixed dedicated VRAM reported to the OS. Instead, memory is allocated from the shared/GTT pool on demand. This is the expected and recommended setting for maximising available memory on a unified-memory APU.
The llama.cpp ROCm runtime does not correctly enumerate total available VRAM in this configuration. It either:
The GPU is enumerated correctly (LM Studio shows the iGPU after Build 2), but the reported VRAM budget reflects only the tiny dedicated slice rather than the ~32 GB of shared pool actually accessible via ROCm's GTT/unified memory path.
Steps to Reproduce
Expected Behavior
llama.cpp ROCm runtime should query the full available memory pool (dedicated + GTT/shared system RAM) and use it for layer offloading, matching actual hardware capability (~32 GB on a 32 GB system in Auto mode).
Actual Behavior
Runtime reports only the small dedicated VRAM slice (or near zero), preventing meaningful GPU offloading even though the unified pool has tens of GBs available.
Workaround
Omitting the
--gpuflag fromlmsCLI calls allows llama.cpp's internal auto-fit to bypass the explicit VRAM budget and spill layers into the unified pool. This works but is not surfaced in the LM Studio UI — there is no way to trigger this behaviour through the app's model load settings.Additional Context
ttm.pages_limitGRUB param to expose GTT pool size; a Windows-equivalent fix would likely be at the ROCm VRAM survey level in the llama.cpp runtime.