Skip to content

scope scheduling RFC #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 12, 2018
Merged

Conversation

nikomatsakis
Copy link
Member

@nikomatsakis nikomatsakis commented Nov 1, 2018

Offer more variations on the scope construct that give more control
over the relative scheduling of spawned tasks. The choices would be:

  • Default: it is not important which tasks get scheduled when. Let
    Rayon pick based on which can be most efficiently implemented. For
    now, this is "per-thread LIFO", but it could change in the future.
  • Per-thread LIFO: The task that the current thread spawned most
    recently will be the first to be executed. Thieves will steal the
    thread that was spawned first (and thus would be the last
    one that the current thread would execute). Tasks spawned by stolen
    tasks will be processed first by the thief, again in a LIFO order,
    but maybe stolen by other threads in turn. This is the current
    default and can be implemented with the highest "micro efficiency".
  • Per-thread FIFO: The task that the current thread spawned first
    is executed first. Thieves will also steal tasks in the same
    order. Tasks spawned by stolen tasks will be processed first by the
    thief. This is "roughly" the behavior today for thread-pools that
    enabled the breadth_first scheduling option.
  • Global FIFO: Tasks always execute in the order in which they
    were spawned, regardless of whether they were stolen or not.

@nikomatsakis
Copy link
Member Author

cc @cuviper

@cuviper
Copy link
Member

cuviper commented Nov 1, 2018

I shared an alternate branch in rayon-rs/rayon#601 (comment):
https://github.com/cuviper/rayon/tree/thread-fifo-spawn

This has all three variants implemented, but the public API is hard-coded to the per-thread FIFO at the moment. I did that so we could get stylo performance numbers, and it sound like it meets their needs!

@ghost
Copy link

ghost commented Nov 8, 2018

My thinking is we don't need global FIFO ordering because it's prohibitively slow due to contention. It doesn't offer much over per-thread FIFO and sort of feels antithetical to how Rayon works and splits tasks into distinct chunks so that they can be processed with as little data sharing as possible.

Per-thread LIFO, while a totally sensible strategy, can already be simulated using rayon::join. For example, the following LIFO-ordered processing:

let v = ...;
rayon::scope(|scope| {
    for i in &v {
        scope.spawn(move |_| println!("{}", i));
    }
});

Can be rewritten as:

let v = ...;
fn go(mut it: impl Iterator<Item = ...>) {
    if let Some(i) = it.next() {
        rayon::join(move || go(it), move || println!("{}", i));
    }
}
rayon::join(||, || go(v.iter()));

Or, even better... :)

let v = ...;
v.par_iter().for_each(|i| println!("{}", i));

So if we supported per-thread LIFO ordered scope queues, it'd just be syntax sugar over rayon::join with recursion. It seems convenient, but I'm not convinced it's worth supporting at all.

To me, the only reasonable ordering strategy is per-thread FIFO. Others are not wrong per se, but they seem dubious.

I'd be okay with merging the per-thread FIFO strategy implemented by @cuviper and not promising too much in the documentation, i.e. the strategy is best-effort. rayon::join is a best-effort LIFO strategy in the same sense - we don't guarantee anything beyond that.

@nikomatsakis nikomatsakis changed the title [WIP] scope scheduling RFC scope scheduling RFC Nov 14, 2018
@nikomatsakis
Copy link
Member Author

I just pushed a big update — I think the RFC is now "first draft complete".

In writing it, I came up with a few open questions:

Is the "global FIFO" mode worthwhile? I feel pretty good about both of the "per-thread" modes. The global FIFO mode however I am not so sure about. In general, it's easy to emulate global modes in "user space" (the TSP code, for example, does this, in order to use a priority queue). Seems like an easy thing to cut.

Should we adopt a builder? Particularly if we cut the "global FIFO" mode, then this RFC could be as simple as adding rayon::fifo_scope -- the builder feels kind of like overkill.

Should we reflect the "kind" of the scope in its type? To that end, we could add a rayon::FifoScope type, which gives us the freedom to use traits to statically resolve the "if fifo" check (even if, in our current implementation, a FifoScope simply wraps a Scope).

@nikomatsakis
Copy link
Member Author

To answer @stjepang:

My thinking is we don't need global FIFO ordering because it's prohibitively slow due to contention. It doesn't offer much over per-thread FIFO and sort of feels antithetical to how Rayon works and splits tasks into distinct chunks so that they can be processed with as little data sharing as possible.

I think I agree. It seems very easy to emulate this in user space.

Per-thread LIFO, while a totally sensible strategy, can already be simulated using rayon::join.

Yes, but it requires non-constant stack space this way. I think I'd prefer prefer to keep per-thread LIFO for the reasons I enumerated in the RFC:

  • It preserves existing behavior.
  • It is the "minimal overhead" variant, and it therefore enables global scheduling to run with minimal overhead (whether FIFO or priority queue).
  • I think for "don't care" cases, LIFO is a better default. I'd rather that you do the tasks that were most directly related to the things you were last doing, absent any other preference. If we only offer that through join, you have to pay in stack space -- this can be "as cheap as" O(log(N)), and that's fine, but sometimes you can't easily "split in half", in which case it's O(N).

@nikomatsakis
Copy link
Member Author

nikomatsakis commented Nov 14, 2018

cc @bholley -- just thought you'd like to be aware of this RFC

@nikomatsakis
Copy link
Member Author

To make it clearer what I am contemplating:

I am mildly leaning towards a "pared down" RFC at this point, which basically just adds two things to rayon-core (and mirrored in rayon):

  • rayon::fifo_scope, which acts like rayon::scope today but with per-thread FIFO scheduling
  • rayon::FifoScope, which corresponds to the existing rayon::Scope type

I see I did not make this explicit in my previous comments.


then we should execute:

- first, the tasks from the join (in reverse order, so B and then A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join does not reverse order -- B will be pushed, A will execute, then B will pop and execute (if it wasn't already stolen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I misremembered. =) Will fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not reversed...

contain a generic). Given the performance measurements, it is unlikely
that the more complex types are worthwhile. This change could also be
made at some later time, conceivably, though it would require more
additions to the `ScopeBuilder` API.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could it be changed later? If we expose per-thread FIFO using the current Scope type now, it seems to me that we'll be locked into that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant "if we use a distinct type, we can still use a branch in the implementation" -- I'll tweak the wording

@cuviper
Copy link
Member

cuviper commented Nov 14, 2018

I think we should also be clear about what happens to the breadth_first flag. We want to deprecate it, but will we make any effort to directly preserve its behavior?

  • If we don't preserve behavior, we should be very clear in the deprecation notice that it's a no-op.
    • Users like Stylo will need to explicitly change their code with the new rayon-core -- regardless of whether rayon itself is updated.
  • If we do want to preserve behavior, then we need to talk about how.
    • I don't mean to preserve everything, where a local thread always steals from the front of its own deque. This behavior is harmful, as noted in the RFC.
    • We can potentially preserve just the scope-spawn order that Stylo wants. It wouldn't be hard with a unified Scope -- we would just forward scope to fifo_scope in that case. But if there's a separate FifoScope, I don't know how to make breadth_first+scope use that transparently.

@nikomatsakis
Copy link
Member Author

I think we should also be clear about what happens to the breadth_first flag. We want to deprecate it, but will we make any effort to directly preserve its behavior? ... We can potentially preserve just the scope-spawn order that Stylo wants. It wouldn't be hard with a unified Scope -- we would just forward scope to fifo_scope in that case.

Very interesting points. It's appealing to say that we make a "best effort" to preserve the flag, though I'm not sure whether there are any other clients beyond Stylo. Even though we went to some lengths to caveat that flag, altering the behavior could definitely affect people in practice (theoretically, I am skeptical there are many users).

Still, it might be a nice design to offer some more explicit choices:

  • scope -- "implementation defined" ordering (currently picks based on the (deprecated) flag), returns Scope, which dynamically selects
  • lifo_scope -- explicit, returns LifoScope, statically known
  • fifo_scope -- explicit, returns FifoScope, statically known

(In particular I have no idea if, in the future, it might be the case that some other default choice "seems better".)

For now, I would expect LifoScope and FifoScope to simply wrap an internal Scope -- they might offer a method like into_scope to permit "casting" to the more general type (for now, this would just return the hidden inner type, but in the future it could wrap in an enum).

@nikomatsakis
Copy link
Member Author

I'd say that preserving the breadth-first flag is not sufficient justification for the complexity above, but one could argue it's a genuinely nicer design.

@cuviper
Copy link
Member

cuviper commented Nov 28, 2018

From our discussion today, we're leaning toward leaving breadth_first entirely as-is, though deprecated, rather than trying to engineer compatible effects on just scope/spawn. This means it would still affect the choice to pop or steal from the local deque, with all the problems of rayon-rs/rayon#590. However, after some grace period we would plan to no-op breadth_first.

In terms of this RFC, this means we'd implement:

  • scope and Scope with thread-LIFO behavior, the status quo
    • deprecated breadth_first will make threads steal from their own deque, crudely FIFO
    • breadth_first will become a no-op later, so the deque is always LIFO
  • fifo_scope and FifoScope with the new thread-FIFO behavior
    • FIFO here is only with respect to the given scope, not the entire deque
    • breadth_first would have no semantic effect here, just making the indirection useless

And global_fifo can be left as a future extension, should we find a compelling use case.

@cuviper
Copy link
Member

cuviper commented Nov 28, 2018

Another part of rayon-rs/rayon#601 was making unscoped spawn always use the pool's global injector queue, so it would always be globally FIFO. Should we address that in this RFC too? Perhaps:

  • spawn left as the status quo, using inject_or_push to either hit the pool's injector (FIFO) or be added to the local deque (LIFO, unless affected by breadth_first).
  • fifo_spawn to use an indirection queue in the local WorkerThread, similar to what FifoScope will do. Calling from outside the pool still goes to the global injector.

@nikomatsakis
Copy link
Member Author

Interesting. I do prefer that global {fifo_,}spawn -- when invoked from a worker thread -- performs "as if" there were a (implicit) global scope surrounding the worker thread, which I think fits what you said @cuviper, correct? If so, I'll add that too.

@cuviper
Copy link
Member

cuviper commented Nov 29, 2018

Yeah, thinking of spawn in a global scope is a good way to associate it with this RFC. :)

If we call the new global function fifo_spawn, what about the method on FifoScope? Is it just spawn, or a somewhat-redundant fifo_spawn to match the global?

@nikomatsakis
Copy link
Member Author

@cuviper updated

@ghost
Copy link

ghost commented Dec 7, 2018

Perhaps it'd be more idiomatic to name the function scope_fifo rather than fifo_scope? 🚲 We've similarly chosen sort_unstable over unstable_sort because that way functions are more discoverable when sorted in rustdoc or when the IDE autocompletes .sort. The same applies to fifo_spawn -> spawn_fifo.

Do you think we could omit the function fifo_spawn? It's not like anyone needs it right now and we can easily add it later if there's a need. Plus, the semantics of such a function in combination with the regular spawn are non-obvious.

I'd be happy with introducing just the following minimal API:

pub fn scope_fifo<'scope, OP, R>(op: OP) -> R where
    OP: for<'s> FnOnce(&'s ScopeFifo<'scope>) -> R + 'scope + Send,
    R: Send;

pub struct ScopeFifo<'scope>;

impl<'scope> ScopeFifo<'scope> {
    pub fn spawn<BODY>(&self, body: BODY)
    where
        BODY: FnOnce(&ScopeFifo<'scope>) + Send + 'scope;
}

@nikomatsakis
Copy link
Member Author

👍 for scope_fifo, done.

I don't have a strong opinion about whether spawn_fifo should be retained -- but I was wondering how much to say about the semantics of pushing into an outer scope etc. I'm not sure if there is any other way to do it, though, apart from spawn_fifo, because scope types are not Send or Sync...

@ghost
Copy link

ghost commented Dec 7, 2018

because scope types are not Send or Sync

Not sure if I'm missing something here, but Scope<'_> is Send + Sync: https://docs.rs/rayon/1.0.3/rayon/struct.Scope.html#synthetic-implementations

@nikomatsakis
Copy link
Member Author

nikomatsakis commented Dec 7, 2018

OK =) I was misremembering (I think initially they were not, but I have a vague memory that we changed it).

In that case, I don't think that any new challenge in particular is introduced by spawn_fifo that doesn't already exist.

That is, you can do something like this:

scope_fifo(|scope1| {
    scope_fifo(|scope2| {
        scope1.spawn(foo);
        scope2.spawn(bar);
        scope1.spawn(baz);
    });
});

Presumably this will execute in an interleaved fashion.

@ghost
Copy link

ghost commented Dec 7, 2018

So what's the right metaphor for global spawn() and spawn_fifo() functions?

Is it this (a):

scope_fifo(|scope1| {
    scope(|scope2| {
        // `spawn()` is `scope2.spawn()`
        // `spawn_fifo()` is scope1.spawn()`
        main();
    });
});

Or this (b):

scope(|scope1| {
    scope_fifo(|scope2| {
        // `spawn()` is `scope1.spawn()`
        // `spawn_fifo()` is scope2.spawn()`
        main();
    });
});

Or something else (c)?

Presumably this will execute in an interleaved fashion.

Are you saying that (a) and (b) behave identically? That might be true in the proposed implementation, but I think it'd make sense for inner scopes to prioritize their own tasks over others on at a best-effort basis (but no guarantees!).

@cuviper
Copy link
Member

cuviper commented Dec 7, 2018

Scope was made Send + Sync in rayon-rs/rayon#443.

That is, you can do something like this:

scope_fifo(|scope1| {
    scope_fifo(|scope2| {
        scope1.spawn(foo);
        scope2.spawn(bar);
        scope1.spawn(baz);
    });
});

Presumably this will execute in an interleaved fashion.

Interesting example! While I think we would generally want to say that scopes are prioritized inside-out, we don't currently have a way to delay those inner scope1.spawn calls. So yes, they'll execute interleaved.

In the absence of any stealing, your example will:

  • enter scope1
    • enter scope2
      • spawn s1 foo, s2 bar, s1 baz
      • pop s1 -> run s1 foo
      • pop s2 -> run s2 bar
      • leave scope2
    • pop s1 -> run s1 baz
    • leave scope1.

But for a slightly weirder case, suppose we add an earlier spawn:

  • enter scope1
    • spawn s1 pre
    • enter scope2
      • spawn s1 foo, s2 bar, s1 baz
      • pop s1 -> run s1 pre
      • pop s2 -> run s2 bar
      • leave scope2
    • pop s1 -> run s1 foo
    • pop s1 -> run s1 baz
    • leave scope1.

We essentially have to run as many scope1 spawns as were pushed after the first scope2 spawn, and then scope2 can return -- this was just one interleaver in both of these cases. Ideally we would to prioritize scope2 fully ahead of scope1, but I'm not sure how to make this happen. In practice, I think it will be OK though.

So what's the right metaphor for global spawn() and spawn_fifo() functions?

In this case, we should think of them as independent scopes, not like nesting, although the interleaving effect is probably about the same. A spawn_fifo() call will only be ordered relative to other spawn_fifo() calls from the same thread.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections, just some typos and clarifications.


then we should execute:

- first, the tasks from the join (in reverse order, so B and then A)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not reversed...

@nikomatsakis
Copy link
Member Author

@cuviper thanks! Fixed.

@ghost
Copy link

ghost commented Dec 10, 2018

Thank you for your replies! I'm happy with the current state of this RFC. 👍

@cuviper
Copy link
Member

cuviper commented Dec 11, 2018

What's our RFC endgame? Should I go ahead and implement a new PR for this?

@nikomatsakis
Copy link
Member Author

@cuviper yes, I think so -- I guess I will merge

@nikomatsakis
Copy link
Member Author

In discussion on Gitter we decided to rename from ScopeFifo::spawn to ScopeFifo::spawn_fifo, for the following reasons:

  • It is consistent with the global spawn_fifo method
  • It is more "bullet-proof", makes it harder to mistakenly use LIFO when you wanted FIFO
  • Leaves room for a single scope that supports both LIFO and FIFO spawns, should we want that

@nikomatsakis
Copy link
Member Author

We decided to merge =)

@nikomatsakis nikomatsakis merged commit e3534a7 into rayon-rs:master Dec 12, 2018
bors bot added a commit to rayon-rs/rayon that referenced this pull request Feb 2, 2019
615: Implement RFC #1: FIFO spawns r=nikomatsakis a=cuviper

This implements rayon-rs/rfcs#1, adding `spawn_fifo`, `scope_fifo`, and `ScopeFifo`, and deprecating the `breadth_first` flag.

Fixes #590.
Closes #601.

Co-authored-by: Josh Stone <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants