Summary
Integer Overflow in ggml/src/gguf.cpp
, gguf_init_from_file_impl
function can lead to Heap Out-of-Bounds Read/Write
Details
The vulnerability originates in the gguf_init_from_file_impl
function within ggml/src/gguf.cpp
. This function is responsible for parsing the GGUF model file format, including its metadata and tensor data.
Step 1: Cumulative Tensor Size Calculation and Overflow
The function iterates through the tensor information read from the GGUF file to calculate the total size required for all tensor data. This total size is accumulated in ctx->size
(a size_t variable):
// In gguf_init_from_file_impl, after reading tensor metadata into ctx->info:
// ...
// compute the total size of the data section, taking into account the alignment
{
ctx->size = 0;
for (size_t i = 0; i < ctx->info.size(); ++i) {
const gguf_tensor_info & ti = ctx->info[i];
if (ti.offset != ctx->size) { // [Check 1]
GGML_LOG_ERROR("%s: tensor '%s' has offset %" PRIu64 ", expected %zu\n",
__func__, ti.t.name, ti.offset, ctx->size);
GGML_LOG_ERROR("%s: failed to read tensor data\n", __func__);
gguf_free(ctx);
return nullptr;
}
ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment); // [Vulnerable Summation]
}
}
// ...
ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment);
: This line adds the padded size of the current tensor (ti.t) to ctx->size
. Crucially, there is no check here to prevent ctx->size from overflowing if the sum exceeds SIZE_MAX. If a GGUF file is crafted with a sequence of tensors whose combined (padded) sizes cause an integer overflow, ctx->size
will wrap around to a much smaller value than the true required size. Let this small, wrapped-around value be S_final_wrapped.
Step 2: Allocation of Insufficient Memory
After calculating ctx->size (which might now be a small, wrapped-around value S_final_wrapped), the code proceeds to allocate memory for the tensor data if params.no_alloc is false.
// ...
struct ggml_context * ctx_data = *params.ctx;
struct ggml_tensor * data = nullptr; // This will be the "blob" tensor
if (!params.no_alloc) {
data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size); // [Allocation]
// ...
data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size);
: A 1D tensor (referred to as data or blob_tensor in this report) is created. Its data payload is allocated with the size ctx->size
. If an overflow occurred in Step 1, ctx->size
is S_final_wrapped, so only a small buffer of S_final_wrapped bytes is allocated on the heap. The pointer to this buffer is data->data
.
Step 3: Miscalculation of Individual Tensor Data Pointers
The code then iterates again through ctx->info
to initialize each ggml_tensor structure within the ctx_data context. The data pointer for each individual tensor (cur->data) is set to point to a location within the (potentially too small) buffer allocated in Step 2.
// ...
ggml_set_no_alloc(ctx_data, true); // Temporarily set no_alloc for creating tensor views
// create the tensors
for (size_t i = 0; i < ctx->info.size(); ++i) {
const struct gguf_tensor_info & info = ctx->info[i]; // `info` is `ti` from Step 1
struct ggml_tensor * cur = ggml_new_tensor(ctx_data, info.t.type, GGML_MAX_DIMS, info.t.ne);
// ...
ggml_set_name(cur, info.t.name);
// point the data member to the appropriate location in the binary blob using the tensor info
if (!params.no_alloc) {
cur->data = (char *) data->data + info.offset; // [Vulnerable Assignment]
}
}
// ...
cur->data = (char *) data->data + info.offset;
: Here, data->data
is the base pointer to the allocated heap buffer (of size S_final_wrapped). info.offset is the offset read directly from the GGUF file for this tensor. If the info.offset for a particular tensor is greater than or equal to S_final_wrapped (the actual size of the allocated buffer), then cur->data will point outside the bounds of this allocated heap buffer.
Consequence: Out-of-Bounds Access
Any subsequent operation that uses such a miscalculated cur->data
pointer (e.g., reading tensor elements for inference, writing to quantize, or even printing values) will result in a heap-based out-of-bounds read or write. For example, the code snippet from examples/gguf/gguf.cpp (specifically from the gguf_ex_read_1 function) attempts to read and print tensor data:
// ... (inside the loop iterating through tensors)
struct ggml_tensor * cur = ggml_get_tensor(ctx_data, name); // 'cur' is the tensor struct
// ... (printf for tensor metadata, including cur->data which is the OOB pointer)
// print first 10 elements
const float * data_ptr = (const float *) cur->data; // [Leak Point 1] data_ptr now holds the OOB pointer
printf("%s data[:10] : ", name);
for (int j = 0; j < MIN(10, ggml_nelements(cur)); ++j) {
printf("%f ", data_ptr[j]); // [Leak Point 2] OOB Read and Print
}
printf("\n\n");
// ...
const float * data_ptr = (const float *) cur->data;
: The data_ptr variable is assigned the value of cur->data. If cur->data is an OOB pointer due to the vulnerability, then data_ptr also becomes an OOB pointer.
printf("%f ", data_ptr[j]);
: This line is the core of the information leak. It dereferences the OOB pointer data_ptr to read sizeof(float) bytes from an out-of-bounds heap location.
PoC
https://huggingface.co/yuuoniy/overflow/blob/main/overflow_poc.gguf
A malicious GGUF model file (e.g., poc3.gguf) can be crafted to trigger this vulnerability. The key is to manipulate the tensor metadata (names, dimensions, types, and crucially, offsets) to cause the ctx->size integer overflow and pass relevant checks. please check https://huggingface.co/yuuoniy/overflow/blob/main/generate_poc.c
Reproduce
Command for Running
gcc generate_poc.c -o generate_poc
./generate_poc overflow_poc.gguf
# build the llama.cpp program
cmake -B build -DCMAKE_BUILD_TYPE=Debug -DGGML_SANITIZE_UNDEFINED=ON -DLLAMA_CURL=OFF
cmake --build build -j
./build/bin/llama-gguf ggml/src/overflow_poc.gguf r n
Log:
gguf_ex_read_0: version: 3
gguf_ex_read_0: alignment: 32
gguf_ex_read_0: data offset: 128
gguf_ex_read_0: n_kv: 0
gguf_ex_read_0: find key: some.parameter.string not found.
gguf_ex_read_0: n_tensors: 2
gguf_ex_read_0: tensor[0]: name = tensor_A, size = 14987979559889010688, offset = 0
gguf_ex_read_0: tensor[1]: name = tensor_B, size = 3458764513820541952, offset = 14987979559889010688
**/home/xx/source/llama.cpp/ggml/src/gguf.cpp:704:49: runtime error: pointer index expression with base 0x562a0c3acdf0 overflowed to 0xd000562a0c3acdf0**
gguf_ex_read_1: version: 3
gguf_ex_read_1: alignment: 32
gguf_ex_read_1: data offset: 128
gguf_ex_read_1: n_kv: 0
gguf_ex_read_1: n_tensors: 2
gguf_ex_read_1: tensor[0]: name = tensor_A, size = 14987979559889010688, offset = 0
gguf_ex_read_1: tensor[1]: name = tensor_B, size = 3458764513820541952, offset = 14987979559889010688
gguf_ex_read_1: reading tensor 0 data
gguf_ex_read_1: tensor[0]: n_dims = 1, ne = (0, 1, 1, 1), name = tensor_A, data = 0x562a0c3acdf0
tensor_A data[:10] : 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000
gguf_ex_read_1: reading tensor 1 data
gguf_ex_read_1: tensor[1]: n_dims = 1, ne = (512, 1, 1, 1), name = tensor_B, data = 0xd000562a0c3acdf0
Segmentation fault (core dumped)
This can cause to read from invaild address and cause information leak
Impact
lead to Heap Out-of-Bounds Read/Write when programs loading malicous gguf model.
Summary
Integer Overflow in
ggml/src/gguf.cpp
,gguf_init_from_file_impl
function can lead to Heap Out-of-Bounds Read/WriteDetails
The vulnerability originates in the
gguf_init_from_file_impl
function withinggml/src/gguf.cpp
. This function is responsible for parsing the GGUF model file format, including its metadata and tensor data.Step 1: Cumulative Tensor Size Calculation and Overflow
The function iterates through the tensor information read from the GGUF file to calculate the total size required for all tensor data. This total size is accumulated in
ctx->size
(a size_t variable):ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment);
: This line adds the padded size of the current tensor (ti.t) toctx->size
. Crucially, there is no check here to prevent ctx->size from overflowing if the sum exceeds SIZE_MAX. If a GGUF file is crafted with a sequence of tensors whose combined (padded) sizes cause an integer overflow,ctx->size
will wrap around to a much smaller value than the true required size. Let this small, wrapped-around value be S_final_wrapped.Step 2: Allocation of Insufficient Memory
After calculating ctx->size (which might now be a small, wrapped-around value S_final_wrapped), the code proceeds to allocate memory for the tensor data if params.no_alloc is false.
data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size);
: A 1D tensor (referred to as data or blob_tensor in this report) is created. Its data payload is allocated with the sizectx->size
. If an overflow occurred in Step 1,ctx->size
is S_final_wrapped, so only a small buffer of S_final_wrapped bytes is allocated on the heap. The pointer to this buffer isdata->data
.Step 3: Miscalculation of Individual Tensor Data Pointers
The code then iterates again through
ctx->info
to initialize each ggml_tensor structure within the ctx_data context. The data pointer for each individual tensor (cur->data) is set to point to a location within the (potentially too small) buffer allocated in Step 2.cur->data = (char *) data->data + info.offset;
: Here,data->data
is the base pointer to the allocated heap buffer (of size S_final_wrapped). info.offset is the offset read directly from the GGUF file for this tensor. If the info.offset for a particular tensor is greater than or equal to S_final_wrapped (the actual size of the allocated buffer), then cur->data will point outside the bounds of this allocated heap buffer.Consequence: Out-of-Bounds Access
Any subsequent operation that uses such a miscalculated
cur->data
pointer (e.g., reading tensor elements for inference, writing to quantize, or even printing values) will result in a heap-based out-of-bounds read or write. For example, the code snippet from examples/gguf/gguf.cpp (specifically from the gguf_ex_read_1 function) attempts to read and print tensor data:const float * data_ptr = (const float *) cur->data;
: The data_ptr variable is assigned the value of cur->data. If cur->data is an OOB pointer due to the vulnerability, then data_ptr also becomes an OOB pointer.printf("%f ", data_ptr[j]);
: This line is the core of the information leak. It dereferences the OOB pointer data_ptr to read sizeof(float) bytes from an out-of-bounds heap location.PoC
https://huggingface.co/yuuoniy/overflow/blob/main/overflow_poc.gguf
A malicious GGUF model file (e.g., poc3.gguf) can be crafted to trigger this vulnerability. The key is to manipulate the tensor metadata (names, dimensions, types, and crucially, offsets) to cause the ctx->size integer overflow and pass relevant checks. please check https://huggingface.co/yuuoniy/overflow/blob/main/generate_poc.c
Reproduce
Command for Running
This can cause to read from invaild address and cause information leak
Impact
lead to Heap Out-of-Bounds Read/Write when programs loading malicous gguf model.