-
Notifications
You must be signed in to change notification settings - Fork 989
Forward CoreInfo
via an digest to the runtime
#9002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ard-core-to-runtime
/cmd fmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall changes look good.
I think it is worth mentioning that these changes have some impact on which component is limiting our block-building throughput. With the static CoreSelector
we had before, we were always authoring at a fixed claim queue offset. So after our cores for that offset were used up, we would skip authoring. After this change, we are still limited by the slot_timer, which is updated based on the claim queue offset at 0. However, the main responsibility lies with the Velocity
in the runtime. It needs to be configured correctly in order to prevent excessive production.
Also quickly discussed with @bkchr the possibility of abusing the dynamic claim queue offset in a scenario where a elastic-scaling chain is configured for example for 3 cores, but has only 1 scheduled. In that scenario, the velocity and runtime constraints are too generous and allow block stealing of future authors.
One thing that is not yet clear to me (or I forgot) is the exact backing time point:
- If I build a block on claim queue offset 2, will this block be backed immediately or only when this claim arrives at position 0 in two relay chain blocks.
- If I build a block on claim queue offset 2, can the cores at offset 0 and 1 still be used, or are they "blocked" by the usage of offset 2? From intuition I would expect that we need to use the cores in order.
@alindima do you know the details of these points?
/// Determine the core for the given `para_id`. | ||
/// | ||
/// Takes into account the `parent` core to find the next available core. | ||
async fn determine_core<Header: HeaderT, RI: RelayChainInterface + 'static>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have test for this one?
relay_parent: &RelayHeader, | ||
para_id: ParaId, | ||
parent: &Header, | ||
) -> Result<Option<(CoreSelector, ClaimQueueOffset, CoreIndex, u16)>, ()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to add a little doc what the u16 is here.
}); | ||
|
||
for (offset, cores) in offset_to_core_count { | ||
if (offset as u32) < claim_queue_offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother adding items with offset < claim_queue_offset to the map in the first place?
let res = if relay_parent_offset > | ||
core_info.as_ref().map(|ci| ci.claim_queue_offset).unwrap_or_default().0 as u32 | ||
{ | ||
claim_queue.find_core(para_id, 0, 0) | ||
} else { | ||
claim_queue.find_core( | ||
para_id, | ||
core_info.as_ref().map_or(0, |ci| ci.selector.0 as u32 + 1), | ||
core_info | ||
.as_ref() | ||
.map_or(0, |ci| ci.claim_queue_offset.0 as u32 - relay_parent_offset), | ||
) | ||
}; | ||
|
||
Ok(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this part a bit hard to digest, what do you think about this?
let res = if relay_parent_offset > | |
core_info.as_ref().map(|ci| ci.claim_queue_offset).unwrap_or_default().0 as u32 | |
{ | |
claim_queue.find_core(para_id, 0, 0) | |
} else { | |
claim_queue.find_core( | |
para_id, | |
core_info.as_ref().map_or(0, |ci| ci.selector.0 as u32 + 1), | |
core_info | |
.as_ref() | |
.map_or(0, |ci| ci.claim_queue_offset.0 as u32 - relay_parent_offset), | |
) | |
}; | |
Ok(res) | |
let (cores_claimed, queue_offset) = match core_info { | |
Some(CoreInfo { selector, claim_queue_offset, .. }) | |
if relay_parent_offset <= claim_queue_offset.0 as u32 => | |
(selector.0 as u32 + 1, claim_queue_offset.0 as u32 - relay_parent_offset), | |
_ => (0, 0), | |
}; | |
Ok(claim_queue.find_core(para_id, cores_claimed, queue_offset)) | |
?claimed_cores, | ||
"Claimed cores.", | ||
slot_timer.update_scheduling( | ||
claim_queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use number_of_cores
?
One more thing to think about is backward compatibility. These changes here are breaking, since older runtimes which use the |
They can't yet use it since the v2 receipts feature is not yet enabled (but will soon be). And even after it's enabled, they could only use it if the enabled the but would be indeed worth thinking what's the worst case scenario if they did |
If you also have a claim at offset 0 on the same core it will be backed immediately.
You can only occupy the core if you have the full candidate chain up until the latest included candidate of the para. And you can only occupy the cores at offset 0. Therefore, you can't occupy a core at offset 0 if it's not building on the latest included block (or if you have the full chain being backed right now at offset 0). So your intuition is right |
…ard-core-to-runtime
CoreInfo
via an inherent to the runtimeCoreInfo
via an digest to the runtime
/cmd prdoc --audience runtime_dev --bump major |
…time_dev --bump major'
…ard-core-to-runtime' into bkchr-collator-forward-core-to-runtime
All GitHub workflows were cancelled due to failure one of the required jobs. |
…ard-core-to-runtime
Before this pull request we had this rather inflexible
SelectCore
type inparachain-system
. It was just taking the last byte of the block number as the core selector. This resulted in issues like #8893. While it was not totally static, it was very complicated to forward the needed information to the runtime. In the case of running with block bundling (500ms blocks), multiple blocks are actually validated on the same core. Finding out the selector and offset without having access to the claim queue is rather hard. The claim queue could be forwarded to the runtime, but it would waste POV size as we would need to include the entire claim queue of all parachains.This pull request solves the problem by moving the entire core selection to the collator side. From there the information is passed via a
PreRuntime
digest to the runtime. TheCoreInfo
contains theselector
,claim_queue_offset
andnumber_of_cores
. Doing this on the collator side is fine as long as we don't have slot durations that are lower than the relay chain slot duration. As we have agreed to always have equal or bigger slot durations on parachains, there should be no problem with this change.Downstream users need to remove the
SelectCore
type from theparachain_system::Config
:Closes: #8893 #8906