-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix workflow ID reuse when running on ScyllaDB #3027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There's some tests failing; looking into them |
|
Fixed the tests; should be ready for review -- I'm not familiar with how to write idiomatic go; happy to make changes to this PR if needed. |
|
Oops, approved by accident, please ignore. |
|
I am confused about this PR. |
We dereference and store to avoid changing all the downstream call sites that expect an |
| rowType, ok := conflictRecord["type"].(*int) | ||
| if !ok || rowType == nil { | ||
| return errors | ||
| } | ||
| // Dereference rowType for later use | ||
| conflictRecord["type"] = *rowType | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finally understood what you try to achieve here. I don't think that you need newConflictRecord func. It should work with default map[string]interface{} and here code should be:
| rowType, ok := conflictRecord["type"].(*int) | |
| if !ok || rowType == nil { | |
| return errors | |
| } | |
| // Dereference rowType for later use | |
| conflictRecord["type"] = *rowType | |
| // ScyllaDB will return rows with null values to match # of queries in a batch query (see #2683). | |
| // Field types will be pointer types (i.e. *int instead of int) in this case to support nil values. | |
| // Check only "type" field for simplicity. | |
| if rowType, ok := conflictRecord["type"].(*int); ok && rowType == nil { | |
| return nil | |
| } |
and move this block before
var errors []errorline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't use newConflictRecord, then ok will be false when I try to get conflictRecord["type"].(*int) -- I'm not familiar enough with Go to know exactly why it works like that, but I'm assuming this is related to how the Cassandra client library works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there is definitely some sort of reflection magic inside Cassandra library. Can you try:
fmt.Printf("%T\n", conflictRecord["type"])w/o newConflictRecord. It should print underlying type of "type" key. What is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be also great if you add test that fails in ScyllaDB w/o your fix and pass with it. Then I will be able to test it myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there is definitely some sort of reflection magic inside Cassandra library. Can you try:
fmt.Printf("%T\n", conflictRecord["type"])w/o
newConflictRecord. It should print underlying type of"type"key. What is it?
If I am not mistaken, this part does Unmarshalling.
https://github.com/gocql/gocql/blob/7a6cf00bbc98f4d7037e4a0fcca96fc946fd63d6/marshal.go#L205
We also used newConflictRecord way in our project to solve this issue, but @PenguinToast was faster :D
It is working on live env and seems to be no issues.
alexshtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test my suggestion with both DBs?
@alexshtin Responded to your suggestion -- without pre-populating a pointer type in the conflictRecord map, |
24eab34 to
e7978fe
Compare
|
I played with it today. Yes, Thanks for contribution! |
Agree that it's much cleaner with your change -- thanks for reviewing & fixing! |
|
I wonder why it is removed from |
(cherry picked from commit ae5cad8)
|
Our patch release strategy is to patch regression or new features that were introduced in corresponding minor version. This issue was always there and we decided to include it in the next minor version release (1.18). |
|
@alexshtin Is there any estimate for when 1.18 will come out? |
(cherry picked from commit ae5cad8)
(cherry picked from commit ae5cad8)
What changed?
Fix an issue where the history service would get stuck in a loop when reusing workflow ID's on top of ScyllaDB.
Why?
Closes #2683.
How did you test it?
First reproduced the issue by running Temporal on top of Scylla locally; then verified that the change fixed the issue. Ran Cassandra persistence tests to ensure that this change didn't break anything else.
Potential risks
As noted in the linked issue:
If there are cases where we should be handling
nilrows as real errors, this change would break that.Is hotfix candidate?
Yes.