Skip to content

Commit d50bc33

Browse files
alexgghbkchr
andcommitted
Fix kusama validators getting 0 backing rewards the first session they enter the active set (#3722)
There is a problem in the way we update `authorithy-discovery` next keys and because of that nodes that enter the active set would be noticed at the start of the session they become active, instead of the start of the previous session as it was intended. This is problematic because: 1. The node itself advertises its addresses on the DHT only when it notices it should become active on around ~10m loop, so in this case it would notice after it becomes active. 2. The other nodes won't be able to detect the new nodes addresses at the beginning of the session, so it won't added them to the reserved set. With 1 + 2, we end-up in a situation where the the new node won't be able to properly connect to its peers because it won't be in its peers reserved set. Now, the nodes accept by default`MIN_GOSSIP_PEERS: usize = 25` connections to nodes that are not in the reserved set, but given Kusama size(> 1000 nodes) you could easily have more than`25` new nodes entering the active set or simply the nodes don't have slots anymore because, they already have connections to peers not in the active set. In the end what the node would notice is 0 backing rewards because it wasn't directly connected to the peers in its backing group. ## Root-cause The flow is like this: 1. At BAD_SESSION - 1, in `rotate_session` new nodes are added to QueuedKeys https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L609 ``` <QueuedKeys<T>>::put(queued_amalgamated.clone()); <QueuedChanged<T>>::put(next_changed); ``` 2. AuthorityDiscovery::on_new_session is called with `changed` being the value of `<QueuedChanged<T>>:` at BAD_SESSION - **2** because it was saved before being updated https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/session/src/lib.rs#L613 3. At BAD_SESSION - 1, `AuthorityDiscovery::on_new_session` doesn't updated its next_keys because `changed` was false. 4. For the entire durations of `BAD_SESSION - 1` everyone calling runtime api `authorities`(should return past, present and future authorities) won't discover the nodes that should become active . 5. At the beginning of BAD_SESSION, all nodes discover the new nodes are authorities, but it is already too late because reserved_nodes are updated only at the beginning of the session by the `gossip-support`. See above why this bad. ## Fix Update next keys with the queued_validators at every session, not matter the value of `changed` this is the same way babe pallet correctly does it. https://github.com/paritytech/polkadot-sdk/blob/02e1a7f476d7d7c67153e975ab9a1bdc02ffea12/substrate/frame/babe/src/lib.rs#L655 ## Notes - The issue doesn't reproduce with proof-authorities changes like `versi` because `changed` would always be true and `AuthorityDiscovery` correctly updates its next_keys every time. - Confirmed at session `37651` on kusama that this is exactly what it happens by looking at blocks with polkadot.js. ## TODO - [ ] Move versi on proof of stake and properly test before and after fix to confirm there is no other issue. --------- Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by: Bastian Köcher <git@kchr.de> (cherry picked from commit 8d0cd4f) Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
1 parent 3c3d6fc commit d50bc33

3 files changed

Lines changed: 62 additions & 23 deletions

File tree

prdoc/pr_3722.prdoc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Schema: Polkadot SDK PRDoc Schema (prdoc) v1.0.0
2+
# See doc at https://raw.githubusercontent.com/paritytech/polkadot-sdk/master/prdoc/schema_user.json
3+
4+
title: Fix kusama 0 backing rewards when entering active set
5+
6+
doc:
7+
- audience: Runtime Dev
8+
description: |
9+
This PR fixes getting 0 backing rewards the first session when
10+
a node enters the active set.
11+
12+
crates:
13+
- name: pallet-authority-discovery

substrate/frame/authority-discovery/src/lib.rs

Lines changed: 44 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -144,19 +144,21 @@ impl<T: Config> OneSessionHandler<T::AccountId> for Pallet<T> {
144144
);
145145

146146
Keys::<T>::put(bounded_keys);
147+
}
147148

148-
let next_keys = queued_validators.map(|x| x.1).collect::<Vec<_>>();
149+
// `changed` represents if queued_validators changed in the previous session not in the
150+
// current one.
151+
let next_keys = queued_validators.map(|x| x.1).collect::<Vec<_>>();
149152

150-
let next_bounded_keys = WeakBoundedVec::<_, T::MaxAuthorities>::force_from(
151-
next_keys,
152-
Some(
153-
"Warning: The session has more queued validators than expected. \
154-
A runtime configuration adjustment may be needed.",
155-
),
156-
);
153+
let next_bounded_keys = WeakBoundedVec::<_, T::MaxAuthorities>::force_from(
154+
next_keys,
155+
Some(
156+
"Warning: The session has more queued validators than expected. \
157+
A runtime configuration adjustment may be needed.",
158+
),
159+
);
157160

158-
NextKeys::<T>::put(next_bounded_keys);
159-
}
161+
NextKeys::<T>::put(next_bounded_keys);
160162
}
161163

162164
fn on_disabled(_i: u32) {
@@ -270,7 +272,7 @@ mod tests {
270272
.map(|id| (&account_id, id))
271273
.collect::<Vec<(&AuthorityId, AuthorityId)>>();
272274

273-
let mut third_authorities: Vec<AuthorityId> = vec![4, 5]
275+
let third_authorities: Vec<AuthorityId> = vec![4, 5]
274276
.into_iter()
275277
.map(|i| AuthorityPair::from_seed_slice(vec![i; 32].as_ref()).unwrap().public())
276278
.map(AuthorityId::from)
@@ -282,6 +284,18 @@ mod tests {
282284
.map(|id| (&account_id, id))
283285
.collect::<Vec<(&AuthorityId, AuthorityId)>>();
284286

287+
let mut fourth_authorities: Vec<AuthorityId> = vec![6, 7]
288+
.into_iter()
289+
.map(|i| AuthorityPair::from_seed_slice(vec![i; 32].as_ref()).unwrap().public())
290+
.map(AuthorityId::from)
291+
.collect();
292+
// Needed for `pallet_session::OneSessionHandler::on_new_session`.
293+
let fourth_authorities_and_account_ids = fourth_authorities
294+
.clone()
295+
.into_iter()
296+
.map(|id| (&account_id, id))
297+
.collect::<Vec<(&AuthorityId, AuthorityId)>>();
298+
285299
// Build genesis.
286300
let mut t = frame_system::GenesisConfig::<Test>::default().build_storage().unwrap();
287301

@@ -310,25 +324,33 @@ mod tests {
310324
third_authorities_and_account_ids.clone().into_iter(),
311325
);
312326
let authorities_returned = AuthorityDiscovery::authorities();
327+
let mut first_and_third_authorities = first_authorities
328+
.iter()
329+
.chain(third_authorities.iter())
330+
.cloned()
331+
.collect::<Vec<AuthorityId>>();
332+
first_and_third_authorities.sort();
333+
313334
assert_eq!(
314-
first_authorities, authorities_returned,
335+
first_and_third_authorities, authorities_returned,
315336
"Expected authority set not to change as `changed` was set to false.",
316337
);
317338

318339
// When `changed` set to true, the authority set should be updated.
319340
AuthorityDiscovery::on_new_session(
320341
true,
321-
second_authorities_and_account_ids.into_iter(),
322-
third_authorities_and_account_ids.clone().into_iter(),
342+
third_authorities_and_account_ids.into_iter(),
343+
fourth_authorities_and_account_ids.clone().into_iter(),
323344
);
324-
let mut second_and_third_authorities = second_authorities
345+
346+
let mut third_and_fourth_authorities = third_authorities
325347
.iter()
326-
.chain(third_authorities.iter())
348+
.chain(fourth_authorities.iter())
327349
.cloned()
328350
.collect::<Vec<AuthorityId>>();
329-
second_and_third_authorities.sort();
351+
third_and_fourth_authorities.sort();
330352
assert_eq!(
331-
second_and_third_authorities,
353+
third_and_fourth_authorities,
332354
AuthorityDiscovery::authorities(),
333355
"Expected authority set to contain both the authorities of the new as well as the \
334356
next session."
@@ -337,12 +359,12 @@ mod tests {
337359
// With overlapping authority sets, `authorities()` should return a deduplicated set.
338360
AuthorityDiscovery::on_new_session(
339361
true,
340-
third_authorities_and_account_ids.clone().into_iter(),
341-
third_authorities_and_account_ids.clone().into_iter(),
362+
fourth_authorities_and_account_ids.clone().into_iter(),
363+
fourth_authorities_and_account_ids.clone().into_iter(),
342364
);
343-
third_authorities.sort();
365+
fourth_authorities.sort();
344366
assert_eq!(
345-
third_authorities,
367+
fourth_authorities,
346368
AuthorityDiscovery::authorities(),
347369
"Expected authority set to be deduplicated."
348370
);

substrate/frame/session/src/lib.rs

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,11 @@ pub trait SessionHandler<ValidatorId> {
285285
/// before initialization of your pallet.
286286
///
287287
/// `changed` is true whenever any of the session keys or underlying economic
288-
/// identities or weightings behind those keys has changed.
288+
/// identities or weightings behind `validators` keys has changed. `queued_validators`
289+
/// could change without `validators` changing. Example of possible sequent calls:
290+
/// Session N: on_new_session(false, unchanged_validators, unchanged_queued_validators)
291+
/// Session N + 1: on_new_session(false, unchanged_validators, new_queued_validators)
292+
/// Session N + 2: on_new_session(true, new_queued_validators, new_queued_validators)
289293
fn on_new_session<Ks: OpaqueKeys>(
290294
changed: bool,
291295
validators: &[(ValidatorId, Ks)],

0 commit comments

Comments
 (0)