-
Notifications
You must be signed in to change notification settings - Fork 5
scope scheduling RFC #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @cuviper |
I shared an alternate branch in rayon-rs/rayon#601 (comment): This has all three variants implemented, but the public API is hard-coded to the per-thread FIFO at the moment. I did that so we could get stylo performance numbers, and it sound like it meets their needs! |
My thinking is we don't need global FIFO ordering because it's prohibitively slow due to contention. It doesn't offer much over per-thread FIFO and sort of feels antithetical to how Rayon works and splits tasks into distinct chunks so that they can be processed with as little data sharing as possible. Per-thread LIFO, while a totally sensible strategy, can already be simulated using let v = ...;
rayon::scope(|scope| {
for i in &v {
scope.spawn(move |_| println!("{}", i));
}
}); Can be rewritten as: let v = ...;
fn go(mut it: impl Iterator<Item = ...>) {
if let Some(i) = it.next() {
rayon::join(move || go(it), move || println!("{}", i));
}
}
rayon::join(||, || go(v.iter())); Or, even better... :) let v = ...;
v.par_iter().for_each(|i| println!("{}", i)); So if we supported per-thread LIFO ordered scope queues, it'd just be syntax sugar over To me, the only reasonable ordering strategy is per-thread FIFO. Others are not wrong per se, but they seem dubious. I'd be okay with merging the per-thread FIFO strategy implemented by @cuviper and not promising too much in the documentation, i.e. the strategy is best-effort. |
I just pushed a big update — I think the RFC is now "first draft complete". In writing it, I came up with a few open questions: Is the "global FIFO" mode worthwhile? I feel pretty good about both of the "per-thread" modes. The global FIFO mode however I am not so sure about. In general, it's easy to emulate global modes in "user space" (the TSP code, for example, does this, in order to use a priority queue). Seems like an easy thing to cut. Should we adopt a builder? Particularly if we cut the "global FIFO" mode, then this RFC could be as simple as adding Should we reflect the "kind" of the scope in its type? To that end, we could add a |
To answer @stjepang:
I think I agree. It seems very easy to emulate this in user space.
Yes, but it requires non-constant stack space this way. I think I'd prefer prefer to keep per-thread LIFO for the reasons I enumerated in the RFC:
|
cc @bholley -- just thought you'd like to be aware of this RFC |
To make it clearer what I am contemplating: I am mildly leaning towards a "pared down" RFC at this point, which basically just adds two things to rayon-core (and mirrored in rayon):
I see I did not make this explicit in my previous comments. |
accepted/rfc0000-scope-scheduling.md
Outdated
|
||
then we should execute: | ||
|
||
- first, the tasks from the join (in reverse order, so B and then A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
join
does not reverse order -- B will be pushed, A will execute, then B will pop and execute (if it wasn't already stolen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I misremembered. =) Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not reversed...
accepted/rfc0000-scope-scheduling.md
Outdated
contain a generic). Given the performance measurements, it is unlikely | ||
that the more complex types are worthwhile. This change could also be | ||
made at some later time, conceivably, though it would require more | ||
additions to the `ScopeBuilder` API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could it be changed later? If we expose per-thread FIFO using the current Scope
type now, it seems to me that we'll be locked into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I meant "if we use a distinct type, we can still use a branch in the implementation" -- I'll tweak the wording
I think we should also be clear about what happens to the
|
Very interesting points. It's appealing to say that we make a "best effort" to preserve the flag, though I'm not sure whether there are any other clients beyond Stylo. Even though we went to some lengths to caveat that flag, altering the behavior could definitely affect people in practice (theoretically, I am skeptical there are many users). Still, it might be a nice design to offer some more explicit choices:
(In particular I have no idea if, in the future, it might be the case that some other default choice "seems better".) For now, I would expect |
I'd say that preserving the breadth-first flag is not sufficient justification for the complexity above, but one could argue it's a genuinely nicer design. |
From our discussion today, we're leaning toward leaving In terms of this RFC, this means we'd implement:
And |
Another part of rayon-rs/rayon#601 was making unscoped spawn always use the pool's global injector queue, so it would always be globally FIFO. Should we address that in this RFC too? Perhaps:
|
Interesting. I do prefer that global {fifo_,}spawn -- when invoked from a worker thread -- performs "as if" there were a (implicit) global scope surrounding the worker thread, which I think fits what you said @cuviper, correct? If so, I'll add that too. |
Yeah, thinking of If we call the new global function |
@cuviper updated |
Perhaps it'd be more idiomatic to name the function Do you think we could omit the function I'd be happy with introducing just the following minimal API: pub fn scope_fifo<'scope, OP, R>(op: OP) -> R where
OP: for<'s> FnOnce(&'s ScopeFifo<'scope>) -> R + 'scope + Send,
R: Send;
pub struct ScopeFifo<'scope>;
impl<'scope> ScopeFifo<'scope> {
pub fn spawn<BODY>(&self, body: BODY)
where
BODY: FnOnce(&ScopeFifo<'scope>) + Send + 'scope;
} |
👍 for I don't have a strong opinion about whether |
Not sure if I'm missing something here, but |
OK =) I was misremembering (I think initially they were not, but I have a vague memory that we changed it). In that case, I don't think that any new challenge in particular is introduced by That is, you can do something like this:
Presumably this will execute in an interleaved fashion. |
So what's the right metaphor for global Is it this (a): scope_fifo(|scope1| {
scope(|scope2| {
// `spawn()` is `scope2.spawn()`
// `spawn_fifo()` is scope1.spawn()`
main();
});
}); Or this (b): scope(|scope1| {
scope_fifo(|scope2| {
// `spawn()` is `scope1.spawn()`
// `spawn_fifo()` is scope2.spawn()`
main();
});
}); Or something else (c)?
Are you saying that (a) and (b) behave identically? That might be true in the proposed implementation, but I think it'd make sense for inner scopes to prioritize their own tasks over others on at a best-effort basis (but no guarantees!). |
Interesting example! While I think we would generally want to say that scopes are prioritized inside-out, we don't currently have a way to delay those inner In the absence of any stealing, your example will:
But for a slightly weirder case, suppose we add an earlier spawn:
We essentially have to run as many
In this case, we should think of them as independent scopes, not like nesting, although the interleaving effect is probably about the same. A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objections, just some typos and clarifications.
accepted/rfc0000-scope-scheduling.md
Outdated
|
||
then we should execute: | ||
|
||
- first, the tasks from the join (in reverse order, so B and then A) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not reversed...
@cuviper thanks! Fixed. |
Thank you for your replies! I'm happy with the current state of this RFC. 👍 |
What's our RFC endgame? Should I go ahead and implement a new PR for this? |
@cuviper yes, I think so -- I guess I will merge |
In discussion on Gitter we decided to rename from
|
We decided to merge =) |
615: Implement RFC #1: FIFO spawns r=nikomatsakis a=cuviper This implements rayon-rs/rfcs#1, adding `spawn_fifo`, `scope_fifo`, and `ScopeFifo`, and deprecating the `breadth_first` flag. Fixes #590. Closes #601. Co-authored-by: Josh Stone <[email protected]>
Offer more variations on the
scope
construct that give more controlover the relative scheduling of spawned tasks. The choices would be:
Rayon pick based on which can be most efficiently implemented. For
now, this is "per-thread LIFO", but it could change in the future.
recently will be the first to be executed. Thieves will steal the
thread that was spawned first (and thus would be the last
one that the current thread would execute). Tasks spawned by stolen
tasks will be processed first by the thief, again in a LIFO order,
but maybe stolen by other threads in turn. This is the current
default and can be implemented with the highest "micro efficiency".
is executed first. Thieves will also steal tasks in the same
order. Tasks spawned by stolen tasks will be processed first by the
thief. This is "roughly" the behavior today for thread-pools that
enabled the
breadth_first
scheduling option.were spawned, regardless of whether they were stolen or not.