core, metrics, p2p: switch some invalid counters to gauges #20047
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've added a lot of metrics to Influx/Grafana and saw that some charts started going into negative numbers (like peer counts). After digging into them, the issue is that we kind of interchangeably used
counters
andgauges
in our code, mainly because counters had "useful"Inc()
andDec()
ops.Turns out that this is a big no-no, as time series databases and charting libs use and visualize counters completely differently than gauges. Gauges are allowed to go up and down, but counters should only be ever incremented (the
go-metrics
library API completely misses this point by having aDec()
op on a counter).Long story short, the reason we didn't see this issue with Prometheus/Grafana was because the Prometheus reporting actually reported counters as gauges (ha-ha, should fix that).
This PR extends
go-metrics
withInc
andDec
ops on gauges, and replaces all our faulty-counters with actual gagues. Charts look ok with them.