Skip to content

Make ingester transfer retries min and max duration configurable #1844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

goelankitt
Copy link
Contributor

Signed-off-by: Ankit Goel [email protected]

What this PR does:
This PR makes the ingester retries backoff duration configurable.
Which issue(s) this PR fixes:
It helps the user deal with issue #1307

@goelankitt goelankitt force-pushed the configurable_transfer_duration branch from 3a2774e to 2567a0c Compare November 20, 2019 17:42
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ankit1ank for the PR!

A part from the few little comments I've left, I personally believe the original issue #1307 has already been fixed #1599 (before that, the minimum backoff period was not honored). That being said, I'm not against allowing to customize it.

Previous comment: I'm not against allowing to configure the backoff period, but I think you can now reliable control how long will be retried just fine tuning -ingester.max-transfer-retries. Assuming that, is there really a good use case for which we want longer or shorter backoff periods compared to the default values?

@@ -135,6 +137,8 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {
cfg.LifecyclerConfig.RegisterFlags(f)

f.IntVar(&cfg.MaxTransferRetries, "ingester.max-transfer-retries", 10, "Number of times to try and transfer chunks before falling back to flushing. Negative value or zero disables hand-over.")
f.IntVar(&cfg.MinTransferRetriesBackOff, "ingester.min-transfer-retries-backoff", 100, "Minimum backoff period for transfers in milliseconds")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a DurationVar. Same for the next one.

@@ -103,7 +103,9 @@ type Config struct {
LifecyclerConfig ring.LifecyclerConfig `yaml:"lifecycler,omitempty"`

// Config for transferring chunks. Zero or negative = no retries.
MaxTransferRetries int `yaml:"max_transfer_retries,omitempty"`
MaxTransferRetries int `yaml:"max_transfer_retries,omitempty"`
MinTransferRetriesBackOff int `yaml:"min_tranfer_retries_backoff,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few things:

  1. There's a typo in the yaml: tranfer > transfer. The same issue applies to the next line.
  2. I would suggest to name it transfer_(max|min)_backoff_period / -ingester.transfer-(max|min)-backoff-period to keep consistency with other backoff settings in cortex
  3. The variable name should be specular to the new parameter naming (and BackOff > Backoff)

@@ -352,9 +352,12 @@ func (i *Ingester) TransferOut(ctx context.Context) error {
if i.cfg.MaxTransferRetries <= 0 {
return fmt.Errorf("transfers disabled")
}
if i.cfg.MaxTransferRetriesBackOff < i.cfg.MinTransferRetriesBackOff {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to validate the configuration. Please add a Validate() function to ingester.Config, and call it from the cortex.Validate(). See how other Validate() function works, for example storage.Config.Validate().

Please also add a unit test on the Validate() func.

@goelankitt
Copy link
Contributor Author

Thanks @ankit1ank for the PR!

A part from the few little comments I've left, I personally believe the original issue #1307 has already been fixed #1599 (before that, the minimum backoff period was not honored). That being said, I'm not against allowing to customize it.

Previous comment: I'm not against allowing to configure the backoff period, but I think you can now reliable control how long will be retried just fine tuning -ingester.max-transfer-retries. Assuming that, is there really a good use case for which we want longer or shorter backoff periods compared to the default values?

Now when I think about it, I agree that these changes shouldn't be needed. I can simply bump up the number of retries. Also, I think we faced the issue because we were running the cortex version which did not have the fix(#1599). I am closing this PR for now. If anyone thinks it's needed I will revisit this PR then.

@goelankitt goelankitt closed this Nov 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants