-
Notifications
You must be signed in to change notification settings - Fork 816
Use newer AWS API for paginated queries #2452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ace13c
to
aebddc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super familiar with the aws workings, but this LGTM with nit!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @bboreham. I definitely agree the new code is way more cleaner. I left few minor comments and a major one in the retrier (unless I'm missing something, I believe it never retries).
}, retryer.withRetrys, withErrorHandler(query.TableName, "DynamoDB.QueryPages")) | ||
}) | ||
if err != nil { | ||
return fmt.Errorf("QueryPage error: table=%v, err=%v", query.TableName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things:
QueryPage
>QueryPages
- I would suggest to use
errors.Wrap()
to wrap the error with the extra info (the parent chain is quite long and we sometimes use unwrap to check, for example, if the root error is a context cancellation or so. I haven't checked if this is the case, but as a rule of thumb Wrap() should be safer)
return fmt.Errorf("QueryPage error: table=%v, err=%v", query.TableName, err) | |
return errors.Wrapf(err, "QueryPages error: table=%v", query.TableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that would be better, but note I didn't change this error in this PR; that's what it did before.
pkg/chunk/aws/retryer.go
Outdated
|
||
// ShouldRetry returns if the failed request is retryable. | ||
func (r *retryer) ShouldRetry(req *request.Request) bool { | ||
var d client.DefaultRetryer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default max number of retries is 0
so the following call to ShouldRetry()
always return false (it's checked inside). I think the this retryer doesn't work as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - hadn't spotted that. Now I look back at this code, I was trying to re-use the AWS-SDK behaviour, but it's probably clearer just to copy-paste it.
Backoff: util.NewBackoff(ctx, cfg), | ||
maxRetries: cfg.MaxRetries, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking loudly.
If, for any reason, the Cortex backoff max retries is 0
it leads to the following edge case:
0
for the Cortex backoff means "infinite"0
for the AWS retries means "do not retry"
I think AWS is better for this use case (I'm quite sure we don't want infinite retry). Looking at the code shouldn't be a problem, but please double check it too.
pkg/chunk/aws/retryer.go
Outdated
} | ||
} | ||
|
||
func (r *retryer) withRetrys(req *request.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean withRetries()
?
907fdaa
to
9ca7610
Compare
I believe I've fixed all the points made. I have also added a check that the context hasn't been cancelled, via |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @bboreham! I have no other comment (and thanks for addressing my feedback). If you could rebase to fix the conflict, then we can merge it.
2d2b232
to
ae51b18
Compare
Rebased. |
Signed-off-by: Bryan Boreham <[email protected]>
This is less code, and more robust when retrying requests. We don't need an indirection on the request object for testing now. Signed-off-by: Bryan Boreham <[email protected]>
Now we are calling QueryPagesWithContext directly we don't need the paging interface and we never re-use request objects. Signed-off-by: Bryan Boreham <[email protected]>
Ingester.flushUserSeries() puts a timeout on the context, so don't retry for longer than that. Signed-off-by: Bryan Boreham <[email protected]>
ae51b18
to
0443a18
Compare
What this PR does:
Call
QueryPagesWithContext()
instead of iterating throughNextPage()
.Use AWS-SDK
Handlers
for tracing each retry and reporting errors - trace output will be different with just one span for the whole query and span-log entries for each page and retried operation.This is less code, and more robust when retrying requests. I strongly suspect that before this change it could get into a state on network errors where it would fail every time until the max number of retrys was hit - error message looks like:
We don't need so much from the request object for testing now.
There are still two operations where we use
request.Send()
, but they do not use pagination so no benefit to rewrite in a similar way.Which issue(s) this PR fixes:
Fixes #403
Part of #1152
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]