Skip to content

Commit 06cd0f4

Browse files
enjoy-binbinsarthakaggarwal97
authored andcommitted
Fix empty primary may have dirty slots data due to bad migration (valkey-io#1285)
If we become an empty primary for some reason, we still need to check if we need to delete dirty slots, because we may have dirty slots data left over from a bad migration. Like the target node forcibly executes CLUSTER SETSLOT NODE to take over the slot without performing key migration. Signed-off-by: Binbin <[email protected]> Signed-off-by: Viktor Söderqvist <[email protected]>
1 parent c02e4e5 commit 06cd0f4

File tree

2 files changed

+32
-1
lines changed

2 files changed

+32
-1
lines changed

src/cluster_legacy.c

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2483,6 +2483,7 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
24832483
* need to delete all the keys in the slots we lost ownership. */
24842484
uint16_t dirty_slots[CLUSTER_SLOTS];
24852485
int dirty_slots_count = 0;
2486+
int delete_dirty_slots = 0;
24862487

24872488
/* We should detect if sender is new primary of our shard.
24882489
* We will know it if all our slots were migrated to sender, and sender
@@ -2709,6 +2710,12 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
27092710
serverLog(LL_NOTICE,
27102711
"My last slot was migrated to node %.40s (%s) in shard %.40s. I am now an empty primary.",
27112712
sender->name, sender->human_nodename, sender->shard_id);
2713+
/* We may still have dirty slots when we became a empty primary due to
2714+
* a bad migration.
2715+
*
2716+
* In order to maintain a consistent state between keys and slots
2717+
* we need to remove all the keys from the slots we lost. */
2718+
delete_dirty_slots = 1;
27122719
}
27132720
} else if (dirty_slots_count) {
27142721
/* If we are here, we received an update message which removed
@@ -2718,6 +2725,10 @@ void clusterUpdateSlotsConfigWith(clusterNode *sender, uint64_t senderConfigEpoc
27182725
*
27192726
* In order to maintain a consistent state between keys and slots
27202727
* we need to remove all the keys from the slots we lost. */
2728+
delete_dirty_slots = 1;
2729+
}
2730+
2731+
if (delete_dirty_slots) {
27212732
for (int j = 0; j < dirty_slots_count; j++) {
27222733
serverLog(LL_NOTICE, "Deleting keys in dirty slot %d on node %.40s (%s) in shard %.40s", dirty_slots[j],
27232734
myself->name, myself->human_nodename, myself->shard_id);
@@ -6122,7 +6133,7 @@ void removeChannelsInSlot(unsigned int slot) {
61226133
/* Remove all the keys in the specified hash slot.
61236134
* The number of removed items is returned. */
61246135
unsigned int delKeysInSlot(unsigned int hashslot) {
6125-
if (!kvstoreDictSize(server.db->keys, hashslot)) return 0;
6136+
if (!countKeysInSlot(hashslot)) return 0;
61266137

61276138
/* We may lose a slot during the pause. We need to track this
61286139
* state so that we don't assert in propagateNow(). */

tests/unit/cluster/replica-migration.tcl

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,3 +400,23 @@ start_cluster 4 4 {tags {external:skip cluster} overrides {cluster-node-timeout
400400
start_cluster 4 4 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
401401
test_cluster_setslot "setslot"
402402
} my_slot_allocation cluster_allocate_replicas ;# start_cluster
403+
404+
start_cluster 3 0 {tags {external:skip cluster} overrides {cluster-node-timeout 1000 cluster-migration-barrier 999}} {
405+
test "Empty primary will check and delete the dirty slots" {
406+
R 2 config set cluster-allow-replica-migration no
407+
408+
# Write a key to slot 0.
409+
R 2 incr key_977613
410+
411+
# Move slot 0 from primary 2 to primary 0.
412+
R 0 cluster bumpepoch
413+
R 0 cluster setslot 0 node [R 0 cluster myid]
414+
415+
# Wait for R 2 to report that it is an empty primary (cluster-allow-replica-migration no)
416+
wait_for_log_messages -2 {"*I am now an empty primary*"} 0 1000 50
417+
418+
# Make sure primary 0 will delete the dirty slots.
419+
verify_log_message -2 "*Deleting keys in dirty slot 0*" 0
420+
assert_equal [R 2 dbsize] 0
421+
}
422+
} my_slot_allocation cluster_allocate_replicas ;# start_cluster

0 commit comments

Comments
 (0)