Skip to content

fix(om2): histograms and negative observed values #2627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Apr 17, 2025

OM1.0 required that the Sum of Histograms is not represented when there are negative observations in a histogram.

This PR is removing this requirement in OM2.0. Due to:
The requirement was never implemented by the Go and Java instrumentation libraries. Enforcing it now would be breaking.
The requirement makes it impossible to implement the use case where the user wants to measure the Sum anyway.
The PromQL engine does not take the Sum into account when doing counter reset detection, thus it does not matter that it can decrease.
We already warned users in the documentation about the possibility of Sum decreasing and not being usable for rate() 10 years ago: PR. And native histograms will not take Sum into account when calculating counter resets during rate() , thus this problem won't come up.

Note: this PR does not make Sum mandatory, that is a different question.

@krajorama krajorama force-pushed the krajo/om2.0-nonosum branch from 7880f71 to bd3c521 Compare April 17, 2025 16:26
@beorn7
Copy link
Member

beorn7 commented Apr 17, 2025

The PromQL engine does not take the Sum into account when doing counter reset detection,

This is only true for native histograms, but not for classic histograms.

(FTR: I proposed to improve the counter reset handling for summaries and classic histograms at KubeCon Berlin in 2017. My proposal was ultimately rejected, so I guess we should not change course now and instead encourage native histograms including NHCB.)

@krajorama
Copy link
Member Author

The PromQL engine does not take the Sum into account when doing counter reset detection,

This is only true for native histograms, but not for classic histograms.

(FTR: I proposed to improve the counter reset handling for summaries and classic histograms at KubeCon Berlin in 2017. My proposal was ultimately rejected, so I guess we should not change course now and instead encourage native histograms including NHCB.)

I've reworded the PR description and I'll copy the final text into the commit message once we agree on it.
Are you ok with making the change in the specification otherwise?

OM1.0 required that the Sum of Histograms is not represented when there
are negative observations in a histogram.

This PR is removing this requirement in OM2.0. Due to:
The requirement was never implemented by the Go and Java instrumentation
  libraries. Enforcing it now would be breaking.
The requirement makes it impossible to implement the use case where the
user wants to measure the Sum anyway.
We already warned users in the documentation about the possibility of
Sum decreasing and not being usable for rate() 10 years ago: #43.
And native histograms will not take Sum into account when calculating
counter resets during rate() , thus this problem won't come up.

Note: this PR does not make Sum mandatory, that is a different question.

Signed-off-by: György Krajcsovits <[email protected]>
@krajorama krajorama force-pushed the krajo/om2.0-nonosum branch from bd3c521 to 3b7d783 Compare April 18, 2025 09:26
@beorn7
Copy link
Member

beorn7 commented Apr 19, 2025

I think the only way of solving this problem properly (beyond getting rid of classic histograms and summaries altogether) is to require PromQL to detect a counter reset in the sum via different means (historically by looking at the count, but nowadays we could also look at the CT).

I don't know how to solve this given that the Prometheus community has decided to not do that. Maybe just leaving it as is in practice (which is arguably what this PR proposes) is the least bad way, but I don't feel I should make this call about OMv2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants