-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Fix ggml_tensor_extra_gpu memory leak #2146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ggml_tensor_extra_gpu memory leak #2146
Conversation
Allocate the `extra` member of `ggml_tensor` statically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be fine with me but ggml changes will need to be approved by @ggerganov.
#2099 uses |
Not a good change - the CUDA implementation should clean-up the allocated memory |
Hi @ggerganov ! Is there any reason why you allocate the memory dynamically? Cleaning up the memory seems very difficult . Looks like the same memory region is used by multiple instances of |
The goal is to keep
Is it really so difficult. I guess we could pre-allocate a single large enough buffer in If it is indeed too difficult we can think of some alternative approaches, but the main requirement is to not couple |
AFAIK, there are two types of Since the lifecycle of them is tied to All temp allocations are made in typedef void (*offload_func_t)(void * offload_ctx, struct ggml_tensor * tensor);
int llama_eval_internal(...) {
ctx0 = ggml_init(...);
#ifdef GGML_USE_CUBLAS
void * offload_ctx = ggml_cuda_init_offload_ctx(...);
#endif
// Now the offload_func will have to be called like this
offload_func(offload_ctx, tensor);
#ifdef GGML_USE_CUBLAS
ggml_cuda_free_offload_ctx(offload_ctx);
#endif
ggml_free(ctx0);
} It's a bit more verbose but we correctly track the life cycle of those temp allocations.
The |
Memory management seems a bit too complicated for me in this case, so I just wanted to let you know that I will not open another PR for this issue. Thank you guys for your input! |
Fixes #2145. Allocate the
extra
member ofggml_tensor
statically