-
Notifications
You must be signed in to change notification settings - Fork 472
adapter: Add coordinator consistency check #21740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adapter: Add coordinator consistency check #21740
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'm curious to see what happens in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I triggered nightly: https://buildkite.com/materialize/nightlies/builds/4008
Whole bunch of failures and panics in nightly, seems related to this change: https://buildkite.com/materialize/nightlies/builds/4009
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's not good that we found inconsistencies, it's great to know that this is working!
src/testdrive/src/action/sql.rs
Outdated
state.materialize_internal_http_addr, | ||
)) | ||
.await? | ||
.text() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to call .json()
here which will parse the body as JSON
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for that. I was thinking about leaving it as is b/c i've added more context when each step can fail which has been helpful while debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be submitted while it makes nightly that red.
src/adapter/src/coord/ddl.rs
Outdated
for id in self | ||
.catalog() | ||
.get_cluster(cluster_id) | ||
.log_indexes | ||
.values() | ||
.cloned() | ||
.collect_vec() | ||
{ | ||
self.drop_compute_read_policy(&id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two notes:
- I would move this to before we drop the cluster. It's possible that
drop_cluster
wipes out some metadata needed to update the read policies of the log indexes. - You'll probably need to collect the log indexes before executing the catalog transaction. Since the transaction completed and this cluster was dropped, the cluster won't exist in the catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This PR has higher risk. Make sure to carefully review the file hotspots. In addition to having this change reviewed, adequate tests should be considered and it may be useful to add observability and/or a feature flag. What's This?
Buggy File Hotspots:
|
@def- i think we're good now |
Motivation
Piggybacking on #21717, adding consistency checks to the coordinator that get run in dev and during testdrive
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.