Background Description
Hey IK,
It becomes harder and harder to merge your work into my fork of KoboldCPP. I'm advancing well, but now I'm hitting the ggml_library barrier.
For example, to merge :
https://github.com/ikawrakow/ik_llama.cpp/pull/9/files#diff-f028a352a33ee20b42faca7dcc389e8f0f9c9a55e016cccffed45fe90bcc13f8R5907
into a current version of KoboldCPP,
I need :
ggml-org/ggml#988
because
"grad" is not a member of ggml_tensor anymore
"static struct ggml_tensor * ggml_softcap_impl(
struct ggml_context * ctx,
struct ggml_tensor * a,
float s_before,
float s_after,
bool inplace) {
GGML_ASSERT(ggml_is_padded_1d(a));
bool is_node = false;
if (a->grad) { // <---------------------------
is_node = true;
}"
I merged and made work on my KCPP fork your first batch of IK quants (2,3,4,5,6) on Cuda, but I also meet trouble to refactor properly the cuda side of things for your more recent quants (specifically on the dot product template modification, even if I might be able to handle that one by myself with more digging into the factoring, I'm not sure).
Anyway, do you have plans to update IK_Llama's GGML Library, or even the whole Llama.CPP (I'm not asking for that last one, though) in the future? I'd love to keep using your work, and integrating it into my KCPP fork is a very good exercise for me to learn, but integrating your work into KCPP without the current ggml library is just too much for me to handle, as is to rebase everything on IK_Llama considering that KCPP mainline follows the developments of Llama.CPP, and thus of the ggml library.
Possible Refactor Approaches
For you to decide!
Background Description
Hey IK,
It becomes harder and harder to merge your work into my fork of KoboldCPP. I'm advancing well, but now I'm hitting the ggml_library barrier.
For example, to merge :
https://github.com/ikawrakow/ik_llama.cpp/pull/9/files#diff-f028a352a33ee20b42faca7dcc389e8f0f9c9a55e016cccffed45fe90bcc13f8R5907
into a current version of KoboldCPP,
I need :
ggml-org/ggml#988
because
"grad" is not a member of ggml_tensor anymore
I merged and made work on my KCPP fork your first batch of IK quants (2,3,4,5,6) on Cuda, but I also meet trouble to refactor properly the cuda side of things for your more recent quants (specifically on the dot product template modification, even if I might be able to handle that one by myself with more digging into the factoring, I'm not sure).
Anyway, do you have plans to update IK_Llama's GGML Library, or even the whole Llama.CPP (I'm not asking for that last one, though) in the future? I'd love to keep using your work, and integrating it into my KCPP fork is a very good exercise for me to learn, but integrating your work into KCPP without the current ggml library is just too much for me to handle, as is to rebase everything on IK_Llama considering that KCPP mainline follows the developments of Llama.CPP, and thus of the ggml library.
Possible Refactor Approaches
For you to decide!