Skip to content

Commit c9c64de

Browse files
authored
Set GLM4 blk.*.attn_output.weight, kqv_out-* matmul to GGML_PREC_F32 to fix infinity values in output (#13639)
1 parent c00a263 commit c9c64de

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

src/llama-graph.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1368,6 +1368,10 @@ ggml_tensor * llm_graph_context::build_attn(
13681368

13691369
if (wo) {
13701370
cur = build_lora_mm(wo, cur);
1371+
if (arch == LLM_ARCH_GLM4) {
1372+
// GLM4 seems to have numerical issues with half-precision accumulators
1373+
ggml_mul_mat_set_prec(cur, GGML_PREC_F32);
1374+
}
13711375
}
13721376

13731377
if (wo_b) {

0 commit comments

Comments
 (0)