-
Notifications
You must be signed in to change notification settings - Fork 816
Make ingester transfer retries min and max duration configurable #1844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make ingester transfer retries min and max duration configurable #1844
Conversation
Signed-off-by: Ankit Goel <[email protected]>
3a2774e
to
2567a0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ankit1ank for the PR!
A part from the few little comments I've left, I personally believe the original issue #1307 has already been fixed #1599 (before that, the minimum backoff period was not honored). That being said, I'm not against allowing to customize it.
Previous comment: I'm not against allowing to configure the backoff period, but I think you can now reliable control how long will be retried just fine tuning -ingester.max-transfer-retries
. Assuming that, is there really a good use case for which we want longer or shorter backoff periods compared to the default values?
@@ -135,6 +137,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) { | |||
cfg.LifecyclerConfig.RegisterFlags(f) | |||
|
|||
f.IntVar(&cfg.MaxTransferRetries, "ingester.max-transfer-retries", 10, "Number of times to try and transfer chunks before falling back to flushing. Negative value or zero disables hand-over.") | |||
f.IntVar(&cfg.MinTransferRetriesBackOff, "ingester.min-transfer-retries-backoff", 100, "Minimum backoff period for transfers in milliseconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a DurationVar
. Same for the next one.
@@ -103,7 +103,9 @@ type Config struct { | |||
LifecyclerConfig ring.LifecyclerConfig `yaml:"lifecycler,omitempty"` | |||
|
|||
// Config for transferring chunks. Zero or negative = no retries. | |||
MaxTransferRetries int `yaml:"max_transfer_retries,omitempty"` | |||
MaxTransferRetries int `yaml:"max_transfer_retries,omitempty"` | |||
MinTransferRetriesBackOff int `yaml:"min_tranfer_retries_backoff,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few things:
- There's a typo in the yaml:
tranfer
>transfer
. The same issue applies to the next line. - I would suggest to name it
transfer_(max|min)_backoff_period
/-ingester.transfer-(max|min)-backoff-period
to keep consistency with other backoff settings in cortex - The variable name should be specular to the new parameter naming (and
BackOff
>Backoff
)
@@ -352,9 +352,12 @@ func (i *Ingester) TransferOut(ctx context.Context) error { | |||
if i.cfg.MaxTransferRetries <= 0 { | |||
return fmt.Errorf("transfers disabled") | |||
} | |||
if i.cfg.MaxTransferRetriesBackOff < i.cfg.MinTransferRetriesBackOff { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place to validate the configuration. Please add a Validate()
function to ingester.Config
, and call it from the cortex.Validate()
. See how other Validate()
function works, for example storage.Config.Validate()
.
Please also add a unit test on the Validate()
func.
Now when I think about it, I agree that these changes shouldn't be needed. I can simply bump up the number of retries. Also, I think we faced the issue because we were running the cortex version which did not have the fix(#1599). I am closing this PR for now. If anyone thinks it's needed I will revisit this PR then. |
Signed-off-by: Ankit Goel [email protected]
What this PR does:
This PR makes the ingester retries backoff duration configurable.
Which issue(s) this PR fixes:
It helps the user deal with issue #1307