Skip to content

http client error: Failed to read connection in current master branch #24388

Description

@InputOutputZ

Name and Version

llama-server
version: 9584 (e25a32e)
built with GNU 14.3.1 for Linux x86_64

Operating systems

Linux

GGML backends

BLAS, CPU

Hardware

EPYC Milan CPU.
24GB RAM
12 vCPUs

Models

All models, but tested with Gemma E2B, Llama-3.2 1B, 2B quantised.

Problem description & steps to reproduce

I'm not sure if its a bug or misconfiguration from my side while I spent hours trying to figure out the case and thought before I leave it, I report. I hope that I won't waste your time but when I run llama-server in router mode with SSL using following environment variables

LLAMA_ARGS="--port 2222 --host 127.0.0.1"
LLAMA_ARG_API_KEY_FILE=/server/keys
LLAMA_ARG_MODELS_DIR=/server/models
LLAMA_ARG_MODELS_AUTOLOAD=enabled
LLAMA_ARG_SSL_KEY_FILE=/server/privkey1.pem
LLAMA_ARG_SSL_CERT_FILE=/server/cert1.pem
LLAMA_ARG_MODELS_MAX=1
LLAMA_ARG_MODELS_PRESET=/server/preset.ini

And make e.g. chat completions post request to a model through llama-server running in router mode, llama-server fails. While that, llama-server spawns new process and loads the model successfully and function as it should but as soon as I start chat and see post request like following triggered:-
curl -k -X POST https://127.0.0.1:48135/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer API-KEY" -d '{
"model": "Llama-3.2-1B-Instruct-Q4_K_M",
"messages": [
{
"role": "user",
"content": "Hello"
}
]
}'

I get this error and no response in frontend. I think the router mode expects the request in plain HTTP, and e.g. Open WebUI sends in HTTPS, resulting in aborting the connection.
Jun 9 17:12:18 server llama-server[3313482]: 24.23.288.878 E srv operator(): http client error: Failed to read connection

Also, btw the GET models endpoint it is not covered by API KEY authorisation. Not sure if this is intended.

With thanks.

First Bad Commit

The bug is not present when SSL is disbled.

Relevant log output

Logs ``` Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.033 I device_info: Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.062 I - BLAS : OpenBLAS (0 MiB, 0 MiB free) Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.091 I - CPU : AMD EPYC-Milan Processor (23744 MiB, 23744 MiB free) Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.185 I system_info: n_threads = 6 (n_threads_batch = 6) / 12 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | OPENMP = 1 | REPACK = 1 | Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.214 I srv llama_server: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.012.295 I srv init: running with SSL: key = /server/privkey1.pem, cert = /server/cert1.pem Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.016.673 I srv init: api_keys: ****1add Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.016.732 I srv init: using 11 threads for HTTP server Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.016.978 I srv start: binding port with default address family Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.020.924 I srv llama_server: loading model Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.020.937 I srv load_model: loading model '/server/Llama-3.2-1B-Instruct-Q4_K_M.gguf' Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.021.026 I common_init_result: fitting params to device memory ... Jun 9 17:12:11 server llama-server[3313482]: [38565] 0.00.021.026 I common_init_result: (for bugs during this step try to reproduce them with -fit off, or provide --verbose logs if the bug only occurs with -fit on) Jun 9 17:12:12 server llama-server[3313482]: [38565] 0.00.697.311 I common_params_fit_impl: projected to use 4930 MiB of host memory vs. 23744 MiB of total host memory Jun 9 17:12:17 server llama-server[3313482]: [38565] 0.06.150.424 I common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.06.527.345 I srv load_model: initializing slots, n_slots = 4 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.277 W common_speculative_init: no implementations specified for speculative decoding Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.292 I slot load_model: id 0 | task -1 | new slot, n_ctx = 131072 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.308 I slot load_model: id 1 | task -1 | new slot, n_ctx = 131072 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.309 I slot load_model: id 2 | task -1 | new slot, n_ctx = 131072 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.309 I slot load_model: id 3 | task -1 | new slot, n_ctx = 131072 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.389 I srv load_model: prompt cache is enabled, size limit: 8192 MiB Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.397 I srv load_model: use `--cache-ram 0` to disable the prompt cache Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.398 I srv load_model: for more info see https://github.com//pull/16391 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.398 I srv load_model: context checkpoints enabled, max = 32, min spacing = 256 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.167.420 I srv init: idle slots will be saved to prompt cache and cleared upon starting a new task Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.176.037 I init: chat template, example_format: '<|start_header_id|>system<|end_header_id|> Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] Cutting Knowledge Date: December 2023 Jun 9 17:12:18 server llama-server[3313482]: [38565] Today Date: 09 Jun 2026 Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|> Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] Hello<|eot_id|><|start_header_id|>assistant<|end_header_id|> Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] Hi there<|eot_id|><|start_header_id|>user<|end_header_id|> Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Jun 9 17:12:18 server llama-server[3313482]: [38565] Jun 9 17:12:18 server llama-server[3313482]: [38565] ' Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.180.073 I srv init: init: chat template, thinking = 0 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.180.130 I srv llama_server: model loaded Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.180.142 I srv llama_server: server is listening on https://127.0.0.1:38565 Jun 9 17:12:18 server llama-server[3313482]: [38565] cmd_child_to_router:ready Jun 9 17:12:18 server llama-server[3313482]: [38565] cmd_child_to_router:info:{"id":"Llama-3.2-1B-Instruct-Q4_K_M","aliases":["Llama-3.2-1B-Instruct-Q4_K_M"],"tags":[],"object":"model","created":1781021538,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":128256,"n_ctx":131072,"n_ctx_train":131072,"n_embd":2048,"n_params":1235814432,"size":799862912}} Jun 9 17:12:18 server llama-server[3313482]: 24.23.287.612 I srv proxy_reques: proxying request to model Llama-3.2-1B-Instruct-Q4_K_M on port 38565 Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.180.531 I srv update_slots: all slots are idle Jun 9 17:12:18 server llama-server[3313482]: [38565] 0.07.180.543 I srv operator(): child server monitoring thread started, waiting for EOF on stdin... Jun 9 17:12:18 server llama-server[3313482]: 24.23.288.878 E srv operator(): http client error: Failed to read connection Jun 9 17:14:28 server llama-server[3313482]: 26.33.099.913 I srv proxy_reques: proxying request to model Llama-3.2-1B-Instruct-Q4_K_M on port 38565 Jun 9 17:14:28 server llama-server[3313482]: 26.33.100.614 E srv operator(): http client error: Failed to read connection Jun 9 17:15:44 server llama-server[3313482]: [38565] 3.33.175.728 I srv operator(): operator(): cleaning up before exit... Jun 9 17:15:44 server llama-server[3313482]: 27.49.284.743 I srv operator(): operator(): cleaning up before exit... Jun 9 17:15:44 server llama-server[3313482]: 27.49.284.757 I srv unload_all: stopping model instance name=Llama-3.2-1B-Instruct-Q4_K_M Jun 9 17:15:44 server llama-server[3313482]: 27.49.284.888 I srv operator(): stopping model instance name=Llama-3.2-1B-Instruct-Q4_K_M Jun 9 17:15:44 server llama-server[3313482]: [38565] 3.33.177.723 I srv operator(): exit command received, exiting... Jun 9 17:15:45 server llama-server[3313482]: 27.50.181.949 I srv operator(): instance name=Llama-3.2-1B-Instruct-Q4_K_M exited with status 0 ```

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions