container_fs_* metrics emits duplicate series with `device=""` when block device name resolution fails

## Problem 

cAdvisor emits duplicate Prometheus samples for `container_fs_*` disk I/O metrics when multiple block devices cannot be resolved from major/minor IDs to device names.

In that case, several per-disk stats are exported with the same labelset, usually:

```text
device="", id="/", image="", name=""
```

The Prometheus client rejects the scrape/gather with:

```text
collected metric "... " was collected before with the same name and label values
```

This causes the entire gather to fail, so unrelated metrics such as memory and CPU may not be exported by consumers using cAdvisor as a library.

## Environment

- cAdvisor version: `v0.56.2`
- Runtime: Docker
- Host: Linux VM
- cgroup mode: cgroup v2
- Deployment: cAdvisor embedded in an agent process, running inside Docker with host PID/network and Docker socket mounted

The issue was observed when the agent was run in Docker on a VM.

Docker run flags:
```
  --pid=host \
  --net=host \
  --cgroupns=host \
  --privileged \
  -v /:/rootfs:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/sys:ro \
  -v /var/run:/host/var/run:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /var/lib/docker:/var/lib/docker:ro \
  -v /dev/disk:/dev/disk:ro \
```

 The same agent does not reproduce this warning in a containerd Kubernetes environment.

## Actual Behavior

The Prometheus gather fails with duplicate `container_fs_*` samples. Example error:

```text
gather failed: 44 error(s) occurred:
* collected metric "container_fs_reads_bytes_total" {
  label:{name:"device" value:""}
  label:{name:"id" value:"/"}
  label:{name:"image" value:""}
  label:{name:"name" value:""}
  label:{name:"runtime_container_id" value:""}
  counter:{value:762880}
} was collected before with the same name and label values
```

The same happens for metrics such as:

```text
container_fs_reads_bytes_total
container_fs_reads_total
container_fs_writes_bytes_total
container_fs_writes_total
```

Because the Prometheus gatherer rejects duplicate series, the scrape/gather fails as a whole.

## Expected Behavior

cAdvisor should not emit duplicate Prometheus samples with identical metric name and identical labels.

If cAdvisor cannot resolve a block device major/minor pair to a device path, it should still preserve uniqueness or skip the unresolved stat. For example, unresolved devices could use a stable fallback label value such as:

```text
device="major:minor"
```

or:

```text
device="unknown:MAJOR:MINOR"
```

## Suspected Cause

The problematic path appears to be:

- Docker disk stats call `AssignDeviceNamesToDiskStats`
- Device names are resolved from major/minor IDs
- If resolution fails, the device string becomes empty
- Prometheus export uses only `device` as the extra label for these disk I/O metrics
- Multiple unresolved devices therefore collapse into the same `device=""` time series

Relevant source locations:

- `ioValues()` emits `stat.Device` directly as the `device` label:
  https://github.com/google/cadvisor/blob/v0.56.2/metrics/prometheus.go#L53-L75

- `container_fs_reads_bytes_total` and related metrics only use `device` as the extra label:
  https://github.com/google/cadvisor/blob/v0.56.2/metrics/prometheus.go#L609-L637

- Docker stats assign device names via `AssignDeviceNamesToDiskStats`:
  https://github.com/google/cadvisor/blob/v0.56.2/container/docker/fs.go#L43-L65

- `deviceIdentifierMap.Find()` currently caches and returns an empty string when `DeviceName()` cannot resolve the major/minor pair:
  https://github.com/google/cadvisor/blob/v0.56.2/container/common/helpers.go#L372-L426

## Proposed Fix

When device-name resolution fails, do not return an empty `Device`.

A possible fix is to make the fallback deterministic and unique per major/minor pair, for example:

```go
s, ok := namer.DeviceName(major, minor)
if !ok || s == "" {
    s = fmt.Sprintf("%d:%d", major, minor)
}
```

This would preserve metric uniqueness and keep the data usable.

Alternative fixes could be:

- drop unresolved per-disk stats
- aggregate unresolved stats before exporting
- add major/minor labels to the affected metrics

The least disruptive option seems to be a stable fallback `device` label because it avoids changing the metric schema while preventing duplicate samples.

## Related Issues

This seems related to the broader problem that `container_fs_*` metrics only expose `device` as the disk discriminator:

- https://github.com/google/cadvisor/issues/3588

---
Duplicate samples cause Prometheus gather failure, which can prevent consumers from exporting unrelated metrics such as container memory and CPU.

I am happy to send a PR if the fallback-device-label approach is acceptable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

container_fs_* metrics emits duplicate series with `device=""` when block device name resolution fails #3873

Problem

Environment

Actual Behavior

Expected Behavior

Suspected Cause

Proposed Fix

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

container_fs_* metrics emits duplicate series with device="" when block device name resolution fails #3873

Description

Problem

Environment

Actual Behavior

Expected Behavior

Suspected Cause

Proposed Fix

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

container_fs_* metrics emits duplicate series with `device=""` when block device name resolution fails #3873