Skip to content

[BUG] Instance-manager excessive memory usage #12040

@webcamleif

Description

@webcamleif

Describe the Bug

I have a cluster running 5 worker nodes each scheduled for storage with longhorn v2 engine.
I do not house a lot of volumes, 18 at the moment with replica 3, so the most volumes per node is currently 16.

However I see growing memory usage for my instance-managers, here is top:
instance-manager-40238a3c2558e39710b9f1f176788379 1456m 8188Mi (16 replicas)
instance-manager-6e7e1992caf997f84cfaf5403d668856 1036m 346Mi (15 replicas)
instance-manager-85cf4a44083931fa8a9abbb7731246da 1105m 6538Mi (4 replicas)
instance-manager-ba5278a8361c0c36a9bdefa3a2b6d2af 1023m 280Mi (5 replicas)
instance-manager-ed00ac89d55bbb7aa45d2853c88bcee1 1097m 8944Mi (14 replicas)

To Reproduce

Deploy longhorn with v2 enginge and 2048 hugepages

Expected Behavior

See maybe 4GB usage for the nodes with 16 replicas?

Support Bundle for Troubleshooting

supportbundle_c3bd2373-0516-493e-96af-e02d2e2dafed_2025-10-25T19-29-18Z.zip

Environment

  • Longhorn version: 1.10.0
  • Impacted volume (PV):
    instance-manager-40238a3c2558e39710b9f1f176788379
    instance-manager-6e7e1992caf997f84cfaf5403d668856
    instance-manager-85cf4a44083931fa8a9abbb7731246da
    instance-manager-ba5278a8361c0c36a9bdefa3a2b6d2af
    instance-manager-ed00ac89d55bbb7aa45d2853c88bcee1
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl kustomize (argocd)
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: Talos v1.11.2
    • Number of control plane nodes in the cluster: 3
    • Number of worker nodes in the cluster: 5
  • Node config
    • OS type and version: Talos v1.11.2
    • Kernel version: 6.12.48-talos
    • CPU per node: 4 per controller, 6 per workers except one with 12
    • Memory per node: 8 per controller, 16 per workers, except one with 64
    • Disk type (e.g. SSD/NVMe/HDD): NVMe
    • Network bandwidth between the nodes (Gbps): 2.5Gb to 10Gb
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal
  • Number of Longhorn volumes in the cluster: 18

Additional context

Image Image

Memory usage, drops when restarting instance-managers
Image

Workaround and Mitigation

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    In Progress

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions