Skip to content

Releases: ggml-org/llama.cpp

b8286

Choose a tag to compare

@github-actions github-actions released this 12 Mar 07:47
5f91b1d

b8285

Choose a tag to compare

@github-actions github-actions released this 12 Mar 07:45
9ef7523

b8281

Choose a tag to compare

@github-actions github-actions released this 12 Mar 03:40
0cec84f

b8280

Choose a tag to compare

@github-actions github-actions released this 12 Mar 01:39
b2e1427

b8279

Choose a tag to compare

@github-actions github-actions released this 12 Mar 00:54
4d99d45

b8278

Choose a tag to compare

@github-actions github-actions released this 11 Mar 22:20
10e5b14

llama-quant : correct n_attention_wv usage (#20357)

  • llama-quant : correct n_attention_wv usage

In #19770, I introduced a regression in the way the
quantize_state_impl counter values were initialized. I was
incrementing and using n_attention_wv in the same loop, when it should
have been fixed by the time we're deciding tensor types in
llama_tensor_get_type_impl (for use_more_bits).

I never observed a difference in any of my
tests

  • it was only after @bartowski kindly pointed this out that I realized
    it was incorrect. (Thanks!)
  • simplify

macOS/iOS:

Linux:

Windows:

openEuler:

b8277

Choose a tag to compare

@github-actions github-actions released this 11 Mar 19:28
90b2731

b8276

Choose a tag to compare

@github-actions github-actions released this 11 Mar 19:08
aa2d278

ggml webgpu: faster normal quant and some k-quant matrix operations, better shader parameter handling (#20173)

  • K quant speedup (#20)

  • Basic JIT compilation for mul_mat, get_rows, and scale (#17)

  • scale jit working

  • preliminary working jit for getrows and mulmat, needs refining

  • simplified mul_mat preprocessing switch statement

  • get_rows fixes, mul_mat refinement

  • formatted + last edits

  • removed some extraneous prints

  • fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish

  • small fix

  • some changes, working

  • get_rows and mul_mat jit fixed and working

  • Update formatting

  • formatting

  • Add header


Co-authored-by: Neha Abbas nehaabbas@ReeseLevines-MacBook-Pro.local
Co-authored-by: Reese Levine reeselevine1@gmail.com

  • Start work on all-encompassing shader library

  • refactor argmax, set_rows

  • Refactor all but flashattention, mat mul

  • no gibberish, all k quants added, merged

  • vec memory fix

  • q6_k matching metal on my machine, tests passing

  • Set tile size for q6_k separately

  • Separate out fast shaders


Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

  • Move towards writeBuffer for params

  • Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups

  • Remove extra file

  • Formatting


Co-authored-by: neha-ha 137219201+neha-ha@users.noreply.github.com

macOS/iOS:

Linux:

Windows:

openEuler:

b8272

Choose a tag to compare

@github-actions github-actions released this 11 Mar 14:00
1274fbe

b8271

Choose a tag to compare

@github-actions github-actions released this 11 Mar 13:54
a7b3dee