Commit f7e052f
committed
replicators: Fix race between domain recovery and dropping table
When a domain fails, we will replace the domain in a background
thread. However, if the failure is due to a replication issue, like
not being able to find a row in the source table, we will remove
the table from readyset. Those two operations will race with each
other.
We should wait for the new domain to be ready before removing the
table. This can be done by checking the replication offsets via the
RPC call. We just need to make the caller wait for the RPC to
succeed instead of returning immediately after the first error.
We use the same approach when we are booting up for the first time
in the noria_adapter.
Closes: REA-5563
Fixes: #1484
Release-Note-Core: Fix a race between domain recovery and dropping
a table after replication failure.
Change-Id: I9274eca5fcf256ce37bdd4b2b2bbfda946f2952e
Reviewed-on: https://gerrit.readyset.name/c/readyset/+/9146
Tested-by: Buildkite CI
Reviewed-by: Jason Brown <jason.b@readyset.io>1 parent 981f459 commit f7e052f
1 file changed
+13
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1159 | 1159 | | |
1160 | 1160 | | |
1161 | 1161 | | |
1162 | | - | |
| 1162 | + | |
| 1163 | + | |
| 1164 | + | |
| 1165 | + | |
| 1166 | + | |
| 1167 | + | |
| 1168 | + | |
| 1169 | + | |
| 1170 | + | |
| 1171 | + | |
| 1172 | + | |
| 1173 | + | |
| 1174 | + | |
1163 | 1175 | | |
1164 | 1176 | | |
1165 | 1177 | | |
| |||
0 commit comments