-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Disable user data when fetch fails #4604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bergundy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, I didn't get a chance to thoroughly review the tests but the tests I thought of (enable / disable) transitions are there.
|
|
||
| firstCall := true | ||
| // hasFetchedUserData is true if we have gotten a successful reply to GetTaskQueueUserData. | ||
| // It's used to control whether we do a long poll or a simple get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think long poll is a bit confusing here, especially since it's a one-of-a-kind concept in server. I think let's just elaborate on what exactly this is. Like, a long poll on a task queue makes sense because you're waiting for a task, but a long poll on something that seems like static data (metadata of a task queue), doesn't immediatley make sense. I'd explain that we're waiting for any updates to the user data, and that we do that by sending a request with our current version that blocks until the user data on the server is updated to a higher version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"long poll" is a general concept for waiting for any sort of new data, and this does fit the pattern. It is used elsewhere in the server and referred to by that name, e.g. getting history and waiting for new events (
| // if caller decide to long poll on workflow execution |
| knownUserData, _, err := c.GetUserData(ctx) | ||
| if err != nil { | ||
| return err | ||
| // Start with a non-long poll after re-enabling after disable, so that we don't have to wait the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrase, "non-long poll after re-enabling after disable," is pretty dense. Also, I'm not sure it's relevant. I think all we need to express here is that, if we have user data, we want to wait for changes to it; if we don't have any user data, we want to fetch the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's one way to look at it. I think the only subtle part is that we may have user data, then get disabled, then get re-enabled. in that case hasFetchedUserData would be true here, so we have to explicitly set it to false. I'm explaining why we explicitly set it to false
service/matching/db.go
Outdated
| initialRangeID = 1 // Id of the first range of a new task queue | ||
| stickyTaskQueueTTL = 24 * time.Hour | ||
|
|
||
| userDataEnabled userDataEnabledState = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: userDataState instead of userDataEnabledState
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
service/matching/db.go
Outdated
| userDataEnabled userDataEnabledState = iota | ||
| userDataDisabled | ||
| userDataSpecificVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should disambiguate these. User data could be:
- Enabled
- Disabled for this tqm because it manages a specific version set (so it will always be disabled)
- Disabled due to a setting on this node or a response it received from a parent (but it could be re-enabled in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's what those are. I'll add comments
What changed?
Why?
How did you test it?
existing tests, new tests
Potential risks
Is hotfix candidate?