Skip to content

With many clients and long response bodies server-side linkerd proxy uses much more memory over HTTP/1.1 than HTTP/1.0 #14397

@zigmund

Description

@zigmund

What is the issue?

UPDATE: see #14397 (comment)

Back in days we used linkerd up to 2.8 version, but had to stop using it because of #7610

Short desc:
With the same overall http server load (rps), average size responses (~128Kb body tested), the more clients sending requests - the more memory usage on server-side linkerd proxy.

You could find full story in original issue.

Now I'm testing actual version (edge-25.7.6) with the issue. Well, it is still here, but I found that the issue is reproducible only with Connection: close requests. No problems with Connection: keep-alive requests.

How can it be reproduced?

The issue is fully reproducible.
Prepared images from #7610 could be used:

  1. Deploy server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-server
  namespace: debug
  labels: &labels
    app: server
spec:
  selector:
    matchLabels: *labels
  template:
    metadata:
      labels: *labels
      annotations:
        linkerd.io/inject: enabled
        config.linkerd.io/proxy-memory-limit: "128Mi"
    spec:
      containers:
      - name: server
        image: zigmund/linkerd-2.9-memory-issue:v1
        ports:
        - name: http
          containerPort: 8080
        resources:
          limits:
            memory: "128Mi"
            cpu: "1"
          requests:
            memory: "64Mi"
            cpu: "0.1"

---
kind: Service
apiVersion: v1
metadata:
  name: test-server
  namespace: debug
  labels: &labels
    app: server
spec:
  selector: *labels
  ports:
  - port: 80
    targetPort: 8080
  1. Deploy clients
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-client
  namespace: debug
  labels: &labels
    app: client
spec:
  replicas: 6
  selector:
    matchLabels: *labels
  template:
    metadata:
      labels: *labels
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
      - name: ubuntu
        image: zigmund/linkerd-2.9-memory-issue-client:v1
        command:
        - "sleep"
        - "infinity"
        resources:
          limits:
            memory: "128Mi"
            cpu: "1"
  1. Load server with http requests using different combinations of client count * thread count (-c) and http keep-alive off (default) / on.
    For example - 1 client, 6 threads / 2 clients * 3 threads / 3 clients * 2 threads / 6 clients * 1 thread.
    Via kubectl exec -it -- bash:
    siege -c 6 -t 5m http://test-server/slow
    siege -R <(echo connection = keep-alive) -c 6 -t 5m http://test-server/slow
  2. Observe same load via linkerd response_total metrics for same client*thread count.
    Observe higher memory usage of server-side linkerd-proxy for more clients with Connection: close requests.
    Observe low memory usage with Connection: keep-alive requests.
  3. Rollout restart server between runs for cleaner tests.

Results:
1 client, 6 threads, Connection: close ~17Mb mem usage
Image

2 clients, 3 threads each, Connection: close ~27Mb mem usage
Image

3 clients, 2 threads, Connection: close ~36Mb mem usage
Image

6 clients, 1 thread, Connection: close ~63Mb mem usage
Image

6 clients, 1 thread, Connection: keep-alive ~9Mb mem usage
Image

Same RPS for all runs
Image

Logs, error output, etc

Nothing special:

{"timestamp":"2025-08-20T06:45:25.998698Z","level":"INFO","fields":{"message":"release 2.311.0 (1d94082) by linkerd on 2025-07-30T04:31:59Z"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.002556Z","level":"INFO","fields":{"message":"Using single-threaded proxy runtime"},"target":"linkerd2_proxy::rt","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004019Z","level":"INFO","fields":{"message":"Admin interface on 0.0.0.0:4191"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004040Z","level":"INFO","fields":{"message":"Inbound interface on 0.0.0.0:4143"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004048Z","level":"INFO","fields":{"message":"Outbound interface on 127.0.0.1:4140"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004055Z","level":"INFO","fields":{"message":"Tap DISABLED"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004061Z","level":"INFO","fields":{"message":"SNI is default.debug.serviceaccount.identity.linkerd.[redacted]"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004067Z","level":"INFO","fields":{"message":"Local identity is default.debug.serviceaccount.identity.linkerd.[redacted]"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.004073Z","level":"INFO","fields":{"message":"Destinations resolved via linkerd-dst-headless.linkerd.svc.[redacted]:8086 (linkerd-destination.linkerd.serviceaccount.identity.linkerd.[redacted])"},"target":"linkerd2_proxy","threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.005300Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.3.73:8090"},"target":"linkerd_pool_p2c","spans":[{"name":"policy"},{"addr":"linkerd-policy.linkerd.svc.[redacted]:8090","name":"controller"}],"threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.005340Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.2.202:8090"},"target":"linkerd_pool_p2c","spans":[{"name":"policy"},{"addr":"linkerd-policy.linkerd.svc.[redacted]:8090","name":"controller"}],"threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.005542Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.3.74:8080"},"target":"linkerd_pool_p2c","spans":[{"name":"identity"},{"server.addr":"linkerd-identity-headless.linkerd.svc.[redacted]:8080","name":"identity"},{"addr":"linkerd-identity-headless.linkerd.svc.[redacted]:8080","name":"controller"}],"threadId":"ThreadId(2)"}
{"timestamp":"2025-08-20T06:45:26.005581Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.2.201:8080"},"target":"linkerd_pool_p2c","spans":[{"name":"identity"},{"server.addr":"linkerd-identity-headless.linkerd.svc.[redacted]:8080","name":"identity"},{"addr":"linkerd-identity-headless.linkerd.svc.[redacted]:8080","name":"controller"}],"threadId":"ThreadId(2)"}
{"timestamp":"2025-08-20T06:45:26.005623Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.3.73:8086"},"target":"linkerd_pool_p2c","spans":[{"name":"dst"},{"addr":"linkerd-dst-headless.linkerd.svc.[redacted]:8086","name":"controller"}],"threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.005656Z","level":"INFO","fields":{"message":"Adding endpoint","addr":"10.252.2.202:8086"},"target":"linkerd_pool_p2c","spans":[{"name":"dst"},{"addr":"linkerd-dst-headless.linkerd.svc.[redacted]:8086","name":"controller"}],"threadId":"ThreadId(1)"}
{"timestamp":"2025-08-20T06:45:26.013330Z","level":"INFO","fields":{"message":"Certified identity","id":"default.debug.serviceaccount.identity.linkerd.[redacted]"},"target":"linkerd_app","spans":[{"name":"daemon"},{"name":"identity"}],"threadId":"ThreadId(2)"}

output of linkerd check -o short

linkerd-version
---------------
‼ cli is up-to-date
    is running version 25.7.6 but the latest edge version is 25.8.3
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 25.7.6 but the latest edge version is 25.8.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
	* linkerd-destination-c88dc745f-gnwdd (edge-25.7.6)
	* linkerd-destination-c88dc745f-smvsz (edge-25.7.6)
	* linkerd-identity-797d8bbc84-cpqvb (edge-25.7.6)
	* linkerd-identity-797d8bbc84-zg9t7 (edge-25.7.6)
	* linkerd-proxy-injector-7454bc57fd-fhg88 (edge-25.7.6)
	* linkerd-proxy-injector-7454bc57fd-vrfj4 (edge-25.7.6)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints

Status check results are √

Environment

  • Kubernetes version: v1.32.5
  • Cluster Environment: baremetal and qemu vm nodes tested
  • Host OS: Ubuntu 24.04.2 LTS, 6.8.0-41-generic
  • Linkerd version: edge-25.7.6

Possible solution

Workaround: enable http keep-alive on client side.

Additional context

No response

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions