Skip to content

Use newer AWS API for paginated queries #2452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 11, 2020
Merged

Use newer AWS API for paginated queries #2452

merged 4 commits into from
May 11, 2020

Conversation

bboreham
Copy link
Contributor

@bboreham bboreham commented Apr 11, 2020

What this PR does:

Call QueryPagesWithContext() instead of iterating through NextPage().
Use AWS-SDK Handlers for tracing each retry and reporting errors - trace output will be different with just one span for the whole query and span-log entries for each page and retried operation.

This is less code, and more robust when retrying requests. I strongly suspect that before this change it could get into a state on network errors where it would fail every time until the max number of retrys was hit - error message looks like:

level=warn ts=2020-04-10T19:30:30Z msg="DynamoDB error" retry=19 table=prod_chunk_index2_2565 err="RequestError: send request failed\ncaused by: Post https://dynamodb.us-east-1.amazonaws.com/: net/http: HTTP/1.x transport connection broken: http: ContentLength=205 with Body length 0"

We don't need so much from the request object for testing now.

There are still two operations where we use request.Send(), but they do not use pagination so no benefit to rewrite in a similar way.

Which issue(s) this PR fixes:
Fixes #403
Part of #1152

Checklist

  • Tests updated
  • NA Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@bboreham bboreham force-pushed the new-dynamo-paging branch 2 times, most recently from 0ace13c to aebddc3 Compare April 11, 2020 19:06
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the aws workings, but this LGTM with nit!

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @bboreham. I definitely agree the new code is way more cleaner. I left few minor comments and a major one in the retrier (unless I'm missing something, I believe it never retries).

}, retryer.withRetrys, withErrorHandler(query.TableName, "DynamoDB.QueryPages"))
})
if err != nil {
return fmt.Errorf("QueryPage error: table=%v, err=%v", query.TableName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. QueryPage > QueryPages
  2. I would suggest to use errors.Wrap() to wrap the error with the extra info (the parent chain is quite long and we sometimes use unwrap to check, for example, if the root error is a context cancellation or so. I haven't checked if this is the case, but as a rule of thumb Wrap() should be safer)
Suggested change
return fmt.Errorf("QueryPage error: table=%v, err=%v", query.TableName, err)
return errors.Wrapf(err, "QueryPages error: table=%v", query.TableName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that would be better, but note I didn't change this error in this PR; that's what it did before.


// ShouldRetry returns if the failed request is retryable.
func (r *retryer) ShouldRetry(req *request.Request) bool {
var d client.DefaultRetryer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default max number of retries is 0 so the following call to ShouldRetry() always return false (it's checked inside). I think the this retryer doesn't work as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - hadn't spotted that. Now I look back at this code, I was trying to re-use the AWS-SDK behaviour, but it's probably clearer just to copy-paste it.

Backoff: util.NewBackoff(ctx, cfg),
maxRetries: cfg.MaxRetries,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking loudly.

If, for any reason, the Cortex backoff max retries is 0 it leads to the following edge case:

  1. 0 for the Cortex backoff means "infinite"
  2. 0 for the AWS retries means "do not retry"

I think AWS is better for this use case (I'm quite sure we don't want infinite retry). Looking at the code shouldn't be a problem, but please double check it too.

}
}

func (r *retryer) withRetrys(req *request.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean withRetries()?

@bboreham bboreham force-pushed the new-dynamo-paging branch from 907fdaa to 9ca7610 Compare April 21, 2020 16:31
@bboreham
Copy link
Contributor Author

I believe I've fixed all the points made.
Regarding retries=0, it doesn't seem to be documented to do infinite retries; I suspect that came out of a merger between two different bits of retry code.

I have also added a check that the context hasn't been cancelled, via Backoff.Ongoing().

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @bboreham! I have no other comment (and thanks for addressing my feedback). If you could rebase to fix the conflict, then we can merge it.

@bboreham bboreham force-pushed the new-dynamo-paging branch from 2d2b232 to ae51b18 Compare April 28, 2020 17:04
@bboreham
Copy link
Contributor Author

Rebased.

bboreham added 4 commits May 7, 2020 14:37
This is less code, and more robust when retrying requests.
We don't need an indirection on the request object for testing now.

Signed-off-by: Bryan Boreham <[email protected]>
Now we are calling QueryPagesWithContext directly we don't need the
paging interface and we never re-use request objects.

Signed-off-by: Bryan Boreham <[email protected]>
Ingester.flushUserSeries() puts a timeout on the context, so don't
retry for longer than that.

Signed-off-by: Bryan Boreham <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request.HasNextPage deprecated. Use Pagination type for configurable pagination of API operations
3 participants