Integer Overflow in GGUF Parser can lead to Heap Out-of-Bounds Read/Write in gguf

Summary

Integer Overflow in ggml/src/gguf.cpp, gguf_init_from_file_impl function can lead to Heap Out-of-Bounds Read/Write

Details

The vulnerability originates in the gguf_init_from_file_impl function within ggml/src/gguf.cpp. This function is responsible for parsing the GGUF model file format, including its metadata and tensor data.

Step 1: Cumulative Tensor Size Calculation and Overflow
The function iterates through the tensor information read from the GGUF file to calculate the total size required for all tensor data. This total size is accumulated in ctx->size (a size_t variable):

// In gguf_init_from_file_impl, after reading tensor metadata into ctx->info:
// ...
    // compute the total size of the data section, taking into account the alignment
    {
        ctx->size = 0;
        for (size_t i = 0; i < ctx->info.size(); ++i) {
            const gguf_tensor_info & ti = ctx->info[i];
            if (ti.offset != ctx->size) { // [Check 1]
                GGML_LOG_ERROR("%s: tensor '%s' has offset %" PRIu64 ", expected %zu\n",
                    __func__, ti.t.name, ti.offset, ctx->size);
                GGML_LOG_ERROR("%s: failed to read tensor data\n", __func__);
                gguf_free(ctx);
                return nullptr;
            }
            ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment); // [Vulnerable Summation]
        }
    }
// ...

ctx->size += GGML_PAD(ggml_nbytes(&ti.t), ctx->alignment);: This line adds the padded size of the current tensor (ti.t) to ctx->size. Crucially, there is no check here to prevent ctx->size from overflowing if the sum exceeds SIZE_MAX. If a GGUF file is crafted with a sequence of tensors whose combined (padded) sizes cause an integer overflow, ctx->size will wrap around to a much smaller value than the true required size. Let this small, wrapped-around value be S_final_wrapped.

Step 2: Allocation of Insufficient Memory
After calculating ctx->size (which might now be a small, wrapped-around value S_final_wrapped), the code proceeds to allocate memory for the tensor data if params.no_alloc is false.

// ...
        struct ggml_context * ctx_data = *params.ctx;
        struct ggml_tensor * data = nullptr; // This will be the "blob" tensor

        if (!params.no_alloc) {
            data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size); // [Allocation]
            // ...

data = ggml_new_tensor_1d(ctx_data, GGML_TYPE_I8, ctx->size);: A 1D tensor (referred to as data or blob_tensor in this report) is created. Its data payload is allocated with the size ctx->size. If an overflow occurred in Step 1, ctx->size is S_final_wrapped, so only a small buffer of S_final_wrapped bytes is allocated on the heap. The pointer to this buffer is data->data.

Step 3: Miscalculation of Individual Tensor Data Pointers
The code then iterates again through ctx->info to initialize each ggml_tensor structure within the ctx_data context. The data pointer for each individual tensor (cur->data) is set to point to a location within the (potentially too small) buffer allocated in Step 2.

// ...
        ggml_set_no_alloc(ctx_data, true); // Temporarily set no_alloc for creating tensor views

        // create the tensors
        for (size_t i = 0; i < ctx->info.size(); ++i) {
            const struct gguf_tensor_info & info = ctx->info[i]; // `info` is `ti` from Step 1

            struct ggml_tensor * cur = ggml_new_tensor(ctx_data, info.t.type, GGML_MAX_DIMS, info.t.ne);
            // ...
            ggml_set_name(cur, info.t.name);

            // point the data member to the appropriate location in the binary blob using the tensor info
            if (!params.no_alloc) {
                cur->data = (char *) data->data + info.offset; // [Vulnerable Assignment]
            }
        }
// ...

cur->data = (char *) data->data + info.offset;: Here, data->data is the base pointer to the allocated heap buffer (of size S_final_wrapped). info.offset is the offset read directly from the GGUF file for this tensor. If the info.offset for a particular tensor is greater than or equal to S_final_wrapped (the actual size of the allocated buffer), then cur->data will point outside the bounds of this allocated heap buffer.

Consequence: Out-of-Bounds Access
Any subsequent operation that uses such a miscalculated cur->data pointer (e.g., reading tensor elements for inference, writing to quantize, or even printing values) will result in a heap-based out-of-bounds read or write. For example, the code snippet from examples/gguf/gguf.cpp (specifically from the gguf_ex_read_1 function) attempts to read and print tensor data:

    // ... (inside the loop iterating through tensors)
    struct ggml_tensor * cur = ggml_get_tensor(ctx_data, name); // 'cur' is the tensor struct

    // ... (printf for tensor metadata, including cur->data which is the OOB pointer)

    // print first 10 elements
    const float * data_ptr = (const float *) cur->data; // [Leak Point 1] data_ptr now holds the OOB pointer

    printf("%s data[:10] : ", name);
    for (int j = 0; j < MIN(10, ggml_nelements(cur)); ++j) {
        printf("%f ", data_ptr[j]); // [Leak Point 2] OOB Read and Print
    }
    printf("\n\n");
    // ...

const float * data_ptr = (const float *) cur->data;: The data_ptr variable is assigned the value of cur->data. If cur->data is an OOB pointer due to the vulnerability, then data_ptr also becomes an OOB pointer.
printf("%f ", data_ptr[j]);: This line is the core of the information leak. It dereferences the OOB pointer data_ptr to read sizeof(float) bytes from an out-of-bounds heap location.

PoC

https://huggingface.co/yuuoniy/overflow/blob/main/overflow_poc.gguf
A malicious GGUF model file (e.g., poc3.gguf) can be crafted to trigger this vulnerability. The key is to manipulate the tensor metadata (names, dimensions, types, and crucially, offsets) to cause the ctx->size integer overflow and pass relevant checks. please check https://huggingface.co/yuuoniy/overflow/blob/main/generate_poc.c

Reproduce

Command for Running

gcc generate_poc.c -o generate_poc
./generate_poc overflow_poc.gguf 
# build the llama.cpp program
cmake -B build -DCMAKE_BUILD_TYPE=Debug  -DGGML_SANITIZE_UNDEFINED=ON -DLLAMA_CURL=OFF
cmake --build build -j
./build/bin/llama-gguf ggml/src/overflow_poc.gguf  r n

Log:
gguf_ex_read_0: version:      3
gguf_ex_read_0: alignment:   32
gguf_ex_read_0: data offset: 128
gguf_ex_read_0: n_kv: 0
gguf_ex_read_0: find key: some.parameter.string not found.
gguf_ex_read_0: n_tensors: 2
gguf_ex_read_0: tensor[0]: name = tensor_A, size = 14987979559889010688, offset = 0
gguf_ex_read_0: tensor[1]: name = tensor_B, size = 3458764513820541952, offset = 14987979559889010688
**/home/xx/source/llama.cpp/ggml/src/gguf.cpp:704:49: runtime error: pointer index expression with base 0x562a0c3acdf0 overflowed to 0xd000562a0c3acdf0**
gguf_ex_read_1: version:      3
gguf_ex_read_1: alignment:   32
gguf_ex_read_1: data offset: 128
gguf_ex_read_1: n_kv: 0
gguf_ex_read_1: n_tensors: 2
gguf_ex_read_1: tensor[0]: name = tensor_A, size = 14987979559889010688, offset = 0
gguf_ex_read_1: tensor[1]: name = tensor_B, size = 3458764513820541952, offset = 14987979559889010688
gguf_ex_read_1: reading tensor 0 data
gguf_ex_read_1: tensor[0]: n_dims = 1, ne = (0, 1, 1, 1), name = tensor_A, data = 0x562a0c3acdf0
tensor_A data[:10] : 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 100.000000 

gguf_ex_read_1: reading tensor 1 data
gguf_ex_read_1: tensor[1]: n_dims = 1, ne = (512, 1, 1, 1), name = tensor_B, data = 0xd000562a0c3acdf0
Segmentation fault (core dumped)

This can cause to read from invaild address and cause information leak

Impact

lead to Heap Out-of-Bounds Read/Write when programs loading malicous gguf model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integer Overflow in GGUF Parser can lead to Heap Out-of-Bounds Read/Write in gguf

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Reproduce

Impact

Severity

CVE ID

Weaknesses

Heap-based Buffer Overflow

Integer Overflow to Buffer Overflow

Credits