Skip to content

container_fs_* metrics emits duplicate series with device="" when block device name resolution fails #3873

@umegbewe

Description

@umegbewe

Problem

cAdvisor emits duplicate Prometheus samples for container_fs_* disk I/O metrics when multiple block devices cannot be resolved from major/minor IDs to device names.

In that case, several per-disk stats are exported with the same labelset, usually:

device="", id="/", image="", name=""

The Prometheus client rejects the scrape/gather with:

collected metric "... " was collected before with the same name and label values

This causes the entire gather to fail, so unrelated metrics such as memory and CPU may not be exported by consumers using cAdvisor as a library.

Environment

  • cAdvisor version: v0.56.2
  • Runtime: Docker
  • Host: Linux VM
  • cgroup mode: cgroup v2
  • Deployment: cAdvisor embedded in an agent process, running inside Docker with host PID/network and Docker socket mounted

The issue was observed when the agent was run in Docker on a VM.

Docker run flags:

  --pid=host \
  --net=host \
  --cgroupns=host \
  --privileged \
  -v /:/rootfs:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/sys:ro \
  -v /var/run:/host/var/run:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /var/lib/docker:/var/lib/docker:ro \
  -v /dev/disk:/dev/disk:ro \

The same agent does not reproduce this warning in a containerd Kubernetes environment.

Actual Behavior

The Prometheus gather fails with duplicate container_fs_* samples. Example error:

gather failed: 44 error(s) occurred:
* collected metric "container_fs_reads_bytes_total" {
  label:{name:"device" value:""}
  label:{name:"id" value:"/"}
  label:{name:"image" value:""}
  label:{name:"name" value:""}
  label:{name:"runtime_container_id" value:""}
  counter:{value:762880}
} was collected before with the same name and label values

The same happens for metrics such as:

container_fs_reads_bytes_total
container_fs_reads_total
container_fs_writes_bytes_total
container_fs_writes_total

Because the Prometheus gatherer rejects duplicate series, the scrape/gather fails as a whole.

Expected Behavior

cAdvisor should not emit duplicate Prometheus samples with identical metric name and identical labels.

If cAdvisor cannot resolve a block device major/minor pair to a device path, it should still preserve uniqueness or skip the unresolved stat. For example, unresolved devices could use a stable fallback label value such as:

device="major:minor"

or:

device="unknown:MAJOR:MINOR"

Suspected Cause

The problematic path appears to be:

  • Docker disk stats call AssignDeviceNamesToDiskStats
  • Device names are resolved from major/minor IDs
  • If resolution fails, the device string becomes empty
  • Prometheus export uses only device as the extra label for these disk I/O metrics
  • Multiple unresolved devices therefore collapse into the same device="" time series

Relevant source locations:

Proposed Fix

When device-name resolution fails, do not return an empty Device.

A possible fix is to make the fallback deterministic and unique per major/minor pair, for example:

s, ok := namer.DeviceName(major, minor)
if !ok || s == "" {
    s = fmt.Sprintf("%d:%d", major, minor)
}

This would preserve metric uniqueness and keep the data usable.

Alternative fixes could be:

  • drop unresolved per-disk stats
  • aggregate unresolved stats before exporting
  • add major/minor labels to the affected metrics

The least disruptive option seems to be a stable fallback device label because it avoids changing the metric schema while preventing duplicate samples.

Related Issues

This seems related to the broader problem that container_fs_* metrics only expose device as the disk discriminator:


Duplicate samples cause Prometheus gather failure, which can prevent consumers from exporting unrelated metrics such as container memory and CPU.

I am happy to send a PR if the fallback-device-label approach is acceptable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions