-
Notifications
You must be signed in to change notification settings - Fork 816
Spread rule evaluation over the evaluation interval #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pkg/ruler/scheduler.go
Outdated
func (s *scheduler) addNewConfigs(now time.Time, cfgs map[string]configs.VersionedRulesConfig) { | ||
// TODO: instrument how many configs we have, both valid & invalid. | ||
level.Debug(util.Logger).Log("msg", "adding configurations", "num_configs", len(cfgs)) | ||
nextEvalCycle := time.Unix(0, int64( | ||
math.Ceil( | ||
math.Mod( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit is plain wrong 🤦♂️
0d79b1e
to
c9d9450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand this well enough. At least needs more comments before it gets merged.
pkg/ruler/scheduler.go
Outdated
math.Mod( | ||
float64(hasher.Sum64()), | ||
cycleNanos))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this defined inline? Why not a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly due to where I put the hashing object, see the other comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still lean to having a separate method that takes the hasher & now
, or even a separate function that takes those two and evaluationInterval
. Would make it much easier to test (hint hint).
Not going to insist on it though.
pkg/ruler/scheduler.go
Outdated
@@ -182,6 +184,18 @@ func (s *scheduler) poll() (map[string]configs.VersionedRulesConfig, error) { | |||
func (s *scheduler) addNewConfigs(now time.Time, cfgs map[string]configs.VersionedRulesConfig) { | |||
// TODO: instrument how many configs we have, both valid & invalid. | |||
level.Debug(util.Logger).Log("msg", "adding configurations", "num_configs", len(cfgs)) | |||
cycleNanos := float64(s.evaluationInterval.Nanoseconds()) | |||
nextEvalCycle := time.Unix(0, int64(math.Ceil(float64(now.UnixNano())/cycleNanos)*cycleNanos)) | |||
hasher := fnv.New64a() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with constructing this out here, rather than inside the function? I'm guessing it's an optimization, but if so, why are we constructing it per addNewConfigs
invocation rather than once per process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's at that level as an optimisation. The hashing object is stateful; I wanted to keep it as local as possible to avoid the need to add mutexes.
pkg/ruler/scheduler.go
Outdated
@@ -182,6 +184,18 @@ func (s *scheduler) poll() (map[string]configs.VersionedRulesConfig, error) { | |||
func (s *scheduler) addNewConfigs(now time.Time, cfgs map[string]configs.VersionedRulesConfig) { | |||
// TODO: instrument how many configs we have, both valid & invalid. | |||
level.Debug(util.Logger).Log("msg", "adding configurations", "num_configs", len(cfgs)) | |||
cycleNanos := float64(s.evaluationInterval.Nanoseconds()) | |||
nextEvalCycle := time.Unix(0, int64(math.Ceil(float64(now.UnixNano())/cycleNanos)*cycleNanos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this.
- I don't get the purpose, what it's trying to achieve
- I can't tell how it's meaningfully different from
time.Unix(0, now.UnixNano())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's trying to compute the start of the next evaluation cycle, so that we can distribute the load independently of when the ruler started, and evaluations per instance are consistently spaced across ruler restarts.
If now is 20 and the evaluation cycle is 15; we want to compute 30 as the start of the next evaluation cycle; 20 / 15 * 15 == 20
, but ceil(20 / 15) * 15 == 30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurs to me that if now is 1, then this will delay all rule evaluation until 15, rather than starting those evaluations which land sooner (e.g. at 2-14). I'll tweak it to avoid waiting once we settle the other questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. Looking forward to your tweak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand this much better now.
The tweak to prevent dead time obscured the desired behaviour even more, so I factored it out to a function and added tests, at the cost of some minor repeated calculations. |
81bef7c
to
b77f41c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions
- make
hashResult
an i64 and do the string conversion in the helper function - comment that we are telling the hash function what to return, and that this relies on
fakeHasher
impl. - maybe use some variables & arithmetic explicitly in the
evalTime
calls in the test
b77f41c
to
e76fc5e
Compare
Given #719 would you want to hash by user and filename, or just stick with user? |
I think I'd just stick with user, because if the file execution order was more spread out it might lead to more confusion |
Rules are currently evaluated every
$evaluationInterval
(e.g. 15s) from when the ruler starts.This results in a burst of rules evaluated every 15 seconds, after which the ruler workers sit idle.
If we instead spread the rules over the whole
$evaluationInterval
, we can spread the work, and the resulting read and write load.I didn't put much thought into which hashing algorithm I chose, any should do; advice greatly appreciated!
See also #702