Skip to content

[BUG] Unable to add nvme disk (block) in talos using dataengine v2 #12038

@lucagervasi

Description

@lucagervasi

Describe the Bug

Not sure if this goes into bug or document improving (as I have no soluition).
I'm trying to add an nvme block device in longhorn. I generally read the documentation, specifically https://longhorn.io/docs/archives/1.7.2/advanced-resources/os-distro-specific/talos-linux-support/
I used a custom image with the officialExtensions (plus some other). Hugepages are active, all the needed drivers seems to be loaded. Still, when I add a disk:

get discoveredvolumes
x.x.x.x   runtime     DiscoveredVolume   nvme1n1     1         disk        960 GB           

get disks
x.x.x.x   runtime     Disk   nvme1n1   2         960 GB   false       nvme                     eui.343337304d40xxxxxxxxxxxxxxxxxxxx   SAMSUNG MZQLBxxxxxxxxxx   S437NX0xxxxxxxxx

ls /sys/block
x.x.x.x   Lrwxrwxrwx   0     0     0         Oct 12 14:02:21   system_u:object_r:sysfs_t:s0   nvme1n1 -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/nvme/nvme1/nvme1n1
---

editing the node (or by ui, it doesn't matter)

  disks:
    nvme:
      allowScheduling: true
      diskDriver: auto
      diskType: block
      evictionRequested: false
      path: "0000:01:00.0"
      storageReserved: 0
      tags: []

The disk remains in not ready:

nvme:
  conditions:
    - lastProbeTime: ""
      lastTransitionTime: "2025-10-25T08:50:43Z"
      message: 'Disk nvme(0000:01:00.0) on node melee is not ready: current state: creating'
      reason: NoDiskInfo
      status: "False"
      type: Ready
    - lastProbeTime: ""
      lastTransitionTime: "2025-10-25T08:50:43Z"
      message: Disk nvme (0000:01:00.0) on the node melee is not ready
      reason: DiskNotReady
      status: "False"
      type: Schedulable
  diskDriver: ""
  diskName: ""
  diskPath: ""
  diskType: block
  diskUUID: ""
  filesystemType: ""
  instanceManagerName: ""
  scheduledBackingImage: {}
  scheduledReplica: {}
  storageAvailable: 0
  storageMaximum: 0
  storageScheduled: 0

And it never comes online.
I suppose i'm missing something?

To Reproduce

No response

Expected Behavior

The disk comes online

Support Bundle for Troubleshooting

supportbundle_dc507ce8-d9b8-43b3-82b1-a03e83010a5c_2025-10-25T09-03-29Z.zip

Environment

  • Longhorn version: v1.10.0
  • Impacted volume (PV): no pv as i am unable to initialize disks
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): Helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: Talos v1.11.3 - Kubernetes v1.34.0
    • Number of control plane nodes in the cluster: 2
    • Number of worker nodes in the cluster: 0 (control plane accepts workloads)
  • Node config
    • OS type and version: Talos v1.11.3
    • Kernel version: 6.12.52-talos
    • CPU per node: 1 (12 intel cores on a node, 16 amd cores on the other node)
    • Memory per node: 32 Gb
    • Disk type (e.g. SSD/NVMe/HDD): nvme
    • Network bandwidth between the nodes (Gbps): 1
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): Baremetal on Hetzner
  • Number of Longhorn volumes in the cluster: 0 (unable to initialize disks)

Additional context

No response

Workaround and Mitigation

None, so far.

Metadata

Metadata

Labels

area/spdkSPDK upstream/downstreamarea/v2-data-enginev2 data engine (SPDK)backport/1.10.1Require to backport to 1.10.1 release branchduplicatedkind/bugrequire/backportRequire backport. Only used when the specific versions to backport have not been definied.require/qa-review-coverageRequire QA to review coverage

Type

Projects

Status

Resolved

Status

Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions