-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
我在实现project3b时,有时会panic在上述代码的L371行。触发该panic的日志如下(加入两个再删除两个quorum)。
2022/03/30 09:44:40.848845 /home/wangqi/workplace/tinykv/kv/test_raftstore/peer_msg_handler.go:109: [info] [wq] Node [region 1] 7, add: 8 Region.ConfVer: 15
2022/03/30 09:44:40.848958 /home/wangqi/workplace/tinykv/kv/test_raftstore/peer_msg_handler.go:109: [info] [wq] Node [region 1] 7, add: 9 Region.ConfVer: 16
2022/03/30 09:44:40.849307 /home/wangqi/workplace/tinykv/kv/test_raftstore/peer_msg_handler.go:124: [info] [wq] Node [region 1] 7, remove: 8 Region.ConfVer: 17
2022/03/30 09:44:40.849519 /home/wangqi/workplace/tinykv/kv/test_raftstore/peer_msg_handler.go:124: [info] [wq] Node [region 1] 7, remove: 9 Region.ConfVer: 18
panic的原因是,在Conver为15和19时,总quorum不变,但是导致scheduler检验到region.conversion跳跃大于1,panic。
之后定位了一下触发区域心跳的位置,发现有如下两处:
(1)https://github.com/tidb-incubator/tinykv/blob/course/kv/raftstore/peer_msg_handler.go#L511-L518
(2)https://github.com/tidb-incubator/tinykv/blob/course/kv/raftstore/peer_msg_handler.go#L202-L204
对于第(1)处,是由时钟触发
对于第(2)处,是当addnode的节点(pending node)追上leader的truncate时触发
对于如下情况:leader和其他大多数节点达成同步,可以直接忽视pending node,进行apply log(addnode和removenode请求),则在当前version,就无法触发(2),则只能等到时钟timeout(1)时才有机会触发区域心跳,这样scheduler就可能检测到region.conversion跳跃大于1,之后panic.
Metadata
Metadata
Assignees
Labels
No labels