-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Host operating system: output of uname -a
Linux 4.4.207-1.el7.elrepo.x86_64 #1 SMP Sat Dec 21 08:00:19 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
node_exporter version: output of node_exporter --version
prom/node-exporter:v1.0.1
node_exporter command line flags
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/rootfs
- --collector.netclass.ignored-devices=^(lo|docker[0-9]|kube-ipvs0|dummy0|kube-dummy-if|veth.+|br\-.+|cali\w{11}|tunl0|tun\-.+)$
- --collector.netdev.device-blacklist=^(lo|docker[0-9]|kube-ipvs0|dummy0|kube-dummy-if|veth.+|br\-.+|cali\w{11}|tunl0|tun\-.+)$
- --collector.filesystem.ignored-mount-points=^/(dev|sys|proc|host|etc|var/lib/kubelet|var/lib/docker/.+|home/.+|data/local-pv/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|efivarfs|tmpfs|nsfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rootfs|rpc_pipefs|securityfs|sysfs|tracefs)$
- --collector.diskstats.ignored-devices=^(ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p|dm-|sr|nbd)\d+$
- --collector.netstat.fields=^(.*_(InErrors|InErrs)|Ip_Forwarding|Ip(6|Ext)_(InOctets|OutOctets)|Icmp6?_(InMsgs|OutMsgs)|TcpExt_(Listen.*|Syncookies.*|TCPSynRetrans|TCPRcvCollapsed|PruneCalled|RcvPruned)|Tcp_(ActiveOpens|InSegs|OutSegs|PassiveOpens|RetransSegs|CurrEstab)|Udp6?_(InDatagrams|OutDatagrams|NoPorts|RcvbufErrors|SndbufErrors))$
- --no-collector.systemd
- --no-collector.bcache
- --no-collector.infiniband
- --no-collector.wifi
- --no-collector.ipvs
Are you running node_exporter in Docker?
Yes, in k8s as a DaemonSet
What did you do that produced an error?
We're using scrape_interval: 15s and scrape_timeout: 15s on prometheus side, and noticed that some nodes have holes in graphs:

Which turns out to be due to large scrape time from bonding and netclass collectors:
node_scrape_collector_duration_seconds

Sometimes even like this:
# time curl -s localhost:9100/metrics >/dev/null
real 0m42.589s
user 0m0.003s
sys 0m0.005s
If we disable these collectors:
- --no-collector.bonding
- --no-collector.netclass
Then holes disappear (on graphs above after 17:30)
What did you expect to see?
Bonding collector metrics are very valuable for us. Currently we have to produce same metrics via textfile collector and custom script.
Is it possible to maybe add some configurable timeout for node_exporter, so that at least some metrics which are ready would be returned? Instead of failing the whole scrape.
In this case collectors maybe should also set node_scrape_collector_success=0 to not hide the issue.
Thank you.