Eval bug: Vulkan version and gpt-oss-20b-MXFP4.gguf model

### Name and Version

230d1169e (HEAD -> master, tag: b6962, origin/master, origin/HEAD) improve CUDA cpy memory bandwidth when copying transposed tensor  (#16841)
Select-String -Path CMakeCache.txt -Pattern "GGML_VULKAN"

CMakeCache.txt:804:GGML_VULKAN:BOOL=OFF
CMakeCache.txt:807:GGML_VULKAN_CHECK_RESULTS:BOOL=OFF
CMakeCache.txt:810:GGML_VULKAN_DEBUG:BOOL=OFF
CMakeCache.txt:813:GGML_VULKAN_MEMORY_DEBUG:BOOL=OFF
CMakeCache.txt:816:GGML_VULKAN_RUN_TESTS:BOOL=OFF
CMakeCache.txt:819:GGML_VULKAN_SHADERS_GEN_TOOLCHAIN:FILEPATH=
CMakeCache.txt:822:GGML_VULKAN_SHADER_DEBUG_INFO:BOOL=OFF
CMakeCache.txt:825:GGML_VULKAN_VALIDATE:BOOL=OFF

### Operating systems

Windows

### GGML backends

Vulkan

### Hardware

System Model	HP ZBook Ultra G1a 14 inch Mobile Workstation PC
Processor	AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S, 3000 Mhz, 16 Core(s), 32 Logical Processor(s)
BIOS Version/Date	HP X89 Ver. 01.03.11, 28/8/2025


### Models

lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf 


### Problem description & steps to reproduce

when running  ..\llama.cpp\build\bin\Release\llama-server.exe -m C:/Users/franc/.lmstudio/models/lmstudio-community/gpt-oss-20b-GGUF/gpt-oss-20b-MXFP4.gguf -c 16384 --temp 0.2 --port 8033 --host 127.0.0.1 -ngl -1
the model load and just exit without warning 
see atatched file 

[gptmodel.txt](https://github.com/user-attachments/files/23378981/gptmodel.txt)

### First Bad Commit

_No response_

### Relevant log output

```shell
llama_context: constructing llama_context
llama_context: n_seq_max     = 4
llama_context: n_ctx         = 16384
llama_context: n_ctx_seq     = 16384
llama_context: n_batch       = 2048
llama_context: n_ubatch      = 512
llama_context: causal_attn   = 1
llama_context: flash_attn    = auto
llama_context: kv_unified    = true
llama_context: freq_base     = 150000.0
llama_context: freq_scale    = 0.03125
llama_context: n_ctx_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context: Vulkan_Host  output buffer size =     3.07 MiB
llama_kv_cache_iswa: creating non-SWA KV cache, size = 16384 cells
llama_kv_cache:    Vulkan0 KV buffer size =   384.00 MiB
llama_kv_cache: size =  384.00 MiB ( 16384 cells,  12 layers,  4/1 seqs), K (f16):  192.00 MiB, V (f16):  192.00 MiB
llama_kv_cache_iswa: creating     SWA KV cache, size = 1024 cells
llama_kv_cache:    Vulkan0 KV buffer size =    24.00 MiB
llama_kv_cache: size =   24.00 MiB (  1024 cells,  12 layers,  4/1 seqs), K (f16):   12.00 MiB, V (f16):   12.00 MiB
llama_context: Flash Attention was auto, set to enabled
llama_context:    Vulkan0 compute buffer size =   398.38 MiB
llama_context: Vulkan_Host compute buffer size =    39.65 MiB
llama_context: graph nodes  = 1352
llama_context: graph splits = 2
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|return|> logit bias = -inf
common_init_from_params: added <|call|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 16384
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Vulkan version and gpt-oss-20b-MXFP4.gguf model #17039

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: Vulkan version and gpt-oss-20b-MXFP4.gguf model #17039

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions