mock up `breadth_first` and `spawn_tail` #360

nikomatsakis · 2017-06-10T10:41:58Z

These introduce two new core APIs that were suggested by @bholley for use in Servo (Stylo, in particular). They are insta-stable because I would want to make a public release so that Servo can make use of them.

The APIs are:

Introduce a new "breadth-first" scheduling mode that can be enabled per thread-pool. In breadth-first mode, both the current thread and stealer threads draw from the bottom of the deque. Depending on your workload, this may be desirable (in the case of stylo, it leads to better cache reuse). Users can opt into breadth-first mode by adjusting the configuration.
Introduce a new spawn_tail() API that is intended to be used for the last task that will be spawned. It is intended as a slight optimization on calling spawn and then immediately returning from the scope, particularly in breadth-first mode. Specifically, it will invoke the closure directly if that closure would have been the next task that the local thread executes anyway.

These changes introduce an if into one of the main hot-paths of Rayon, so I was a bit nervous about performance. However, keep in mind that the if is 100% predictable in normal usage (i.e., if you don't mix breadth- and depth-first modes). Both @bholley and I did measurements that suggest basically zero impact on depth-first code. Also, using the newer modes leads to a 20% improvement in performance for Stylo. The stylo measurements are documented here. My own measurements, gathered on my 8x2-core linux desktop, are visible here:

In the median view, you can see some effect, but it is only a few percent, and I am not that confident in my measurement setup. =) Because of my inability to convince GNUplot to use colors etc in median mode (have to tinker with that more), it's a bit hard to tell which were affected; but if you hover the cursor over, you can see that join_recursively, increment_all_max, and increment_all are the ones that took more time, whereas fibonacci and increment_all_min took less time. All of these are micro-benchmarks.

I was considering putting these behind a cargo feature (perhaps one enabled by default), so that users could "opt-out" from having the if. This seems like a plausible path, but the measurements suggest to me that it's not necessary, and it seems better to avoid having "options" if we can (so as to reduce the need to measure an exponentially increasing number of combinations).

nikomatsakis · 2017-06-10T10:45:16Z

I think now that I am re-considering the spawn_tail() API in particular. Perhaps it would be better to expose a method on Scope that simply tells you whether the local deque is empty or not. In that case, code using BFS mode that used to call scope.spawn_tail(closure) can instead do:

if scope.local_thread_has_pending_tasks() {
    scope.spawn(closure)
} else {
    closure()
}

There are two reasons for this:

spawn_tail doesn't make sense in depth-first mode at all, since it is equivalent to just calling the closure.
even in BF-mode, you probably would prefer to use a loop or something to avoid making your stack grow deeply.

For example:

let mut parent = some_node();
loop {
    let work_items = parent.children();

    // avoid spawning a task if (a) no other work is pending and (b) only one thing
    if work_items.len() == 1 && !scope.local_thread_has_pending_tasks() {
        parent = work_items[0];
        continue;
    }

    for child in work_items { scope.spawn(move || process(child)); }
}

@bholley -- thoughts?

ghost · 2017-06-10T11:14:02Z

rayon-core/src/registry.rs

+        if !self.breadth_first {
+            self.worker.pop()
+        } else {
+            self.stealer.steal()


I've just published a new version of Coco that introduces Worker::steal method. Now you can call self.worker.steal() here. This change will give some performance wins in breadth-first mode. I'd be curious to know how much :)

nikomatsakis · 2017-06-10T11:53:49Z

@stjepang I updated to use the new APIs. @bholley will have to re-run the measurements, though since we actually don't have any benchmarks employing BF mode (I've been meaning to make a pared down version of stylo for that purpose, but haven't gotten around to it.)

nikomatsakis · 2017-06-10T12:00:12Z

Regarding the "local thread has pending tasks" idea, I guess that the best thing would be to expose it as a free function, rather like current_thread_index (well, that's presently only a method on ThreadPool, but I think we should rectify that). And it would probably be called pending_local_tasks() or something, with a comment explaining that this is an approximation that indicates whether the current thread has published unstolen tasks. Because of the inherent raciness, a return of false is meaningful here, but true maybe not.

Interestingly, I've seen this proposed elsewhere as a good heuristic for whether to spawn a new task in general. In that work, it was called "being hungry", I think. The idea was: if you have no pending tasks, then this implies that other threads are "hungry" because they've stolen all of your tasks. But if you do have pending tasks, you don't necessarily want to go generating more, since they are unlikely to be consumed.

(I'm a bit wary of 'locking in' on work-stealing here, but this seems like an API that we could continue to support even if we changed to something else, even if it meant always returning false or true. And anyway I can't really imagine changing to something else.)

bholley · 2017-06-10T16:28:55Z

Yes, I would prefer the accessor over spawn_tail, which would be more ergonomic for our use-case.

nikomatsakis · 2017-06-12T17:23:19Z

@bholley I pushed a change to current_thread_has_pending_tasks(), which is offered as a method on the ThreadPool (much like current_thread_index()).

I tend to think we should mirror all of these methods at the top-level as well. The main reason that I've been hesitating (particularly with respect to current_thread_index()) is that I was wary that if we ever changed our thread-pool strategy to dynamically change the number of threads, this would invalidate a lot of existing code. As long as the methods are restricted to the thread-pool, this seems harmless enough, since we could make it an "opt-in" bit of configuration (but make it the default for the global thread-pool).

But I am beginning to wonder if we will ever do this change -- or at least if we would ever do it without some form of opt-in. There doesn't seem to be much reason to do it, except to try to scale gracefully with blocking I/O tasks. And based on conversations I've had with others, that is best done via some opt-in APIs that wrap the blocking I/O, which would serve the purpose.

Therefore, I think I'd like to ensure that -- to the extent it makes sense -- the API surface of ThreadPool is mirrored with top-level functions that act on whatever the currently executing thread-pool is. That is, if you do thread_pool.foo(), then it is local to he given thread-pool, but foo() just works in your current context, whatever that is.

(@cuviper, what do you think of this reasoning?)

bholley · 2017-06-12T17:26:03Z

SGTM. FWIW I think that making the thread pools dynamic would be a pretty surprising change, and we'd probably want it to be opt-in.

nikomatsakis · 2017-06-12T17:27:59Z

@bholley

FWIW I think that making the thread pools dynamic would be a pretty surprising change, and we'd probably want it to be opt-in.

Yes, and also I'm not keen on the "default" configuration (i.e., Configuration::new()) and the configuration that the default scheduler uses diverging from one another.

nikomatsakis · 2017-06-12T17:34:59Z

I've mirrored the current_foo() APIs at the global level now. I carried over the existing "future compatibility note" warnings, though I suspect that we would not make such a change. Maybe we could remove them, but...

cuviper

Overall I'm good with this. I think it's a little hard to reason about when it would be beneficial, but the docs are sufficiently squishy about promising anything that I think we can just say "measure it".

There are a few simple errors noted in CI here, though oddly only on OSX... how did Linux pass!?! The more recent CI run appears to be stuck in the queue...

cuviper · 2017-06-12T18:41:19Z

rayon-core/src/thread_pool/mod.rs

+///
+/// [m]: struct.ThreadPool.html#method.current_thread_has_pending_tasks
+#[inline]
+pub fn current_thread_has_pending_tasks(&self) -> Option<bool> {


There's no &self in free functions...

cuviper · 2017-06-12T18:44:41Z

rayon-core/src/lib.rs

 mod util;

 pub use thread_pool::ThreadPool;
+pub use thread_pool::current_num_threads;


There's already a current_num_threads below, and it's a little different because it reports on the global pool if you're not in a pool already.

cuviper · 2017-06-12T18:46:16Z

rayon-core/src/thread_pool/mod.rs

+        if curr.is_null() {
+            None
+        } else {
+            Some((*curr).registry.num_threads())


The field registry is private.

cuviper · 2017-06-12T18:49:29Z

rayon-core/src/registry.rs

+    }
+
+    #[inline]
+    pub fn breadth_first(&self) -> bool {


Is this method used anywhere?

nikomatsakis · 2017-06-12T21:34:33Z

@cuviper um yeah obviously I didn't really build that. I pushed a new revision that addresses your various points. I removed the new current_num_threads in favor of the existing one. I had forgotten we made that available! I guess it's fine that it returns usize instead of Option<usize> -- that's probably what you want anyway, I imagine (i.e., you can measure the number of threads before you start your parallel operation, and this tells you how many threads you will have, unless you switch thread-pools in the meantime).

cuviper

Looks good!

mock up breadth_first and spawn_tail

7fa963b

ghost reviewed Jun 10, 2017

View reviewed changes

update to require coco-1.1.0, which offers a worker steal method

cd6cab3

replace spawn_tail() with current_thread_has_pending_tasks()

df7759a

cuviper requested changes Jun 12, 2017

View reviewed changes

nikomatsakis added 3 commits June 12, 2017 17:31

mirror the ThreadPool APIs in the global scope

9d84c5e

change comment to be less definitive

e1b7243

kill the old stats stuff to eliminate warnings

7b25a39

nikomatsakis force-pushed the servo-bfs-and-spawn_tail branch from 19d71d7 to 7b25a39 Compare June 12, 2017 21:32

cuviper approved these changes Jun 13, 2017

View reviewed changes

cuviper merged commit 3eaba54 into master Jun 13, 2017

cuviper deleted the servo-bfs-and-spawn_tail branch October 6, 2017 21:39

ghost mentioned this pull request Nov 6, 2017

Initial implementation crossbeam-rs/crossbeam-deque#1

Merged

cuviper mentioned this pull request Oct 5, 2018

Deprecate breadth_first() and always spawn FIFO #601

Closed

mock up breadth_first and spawn_tail #360

mock up breadth_first and spawn_tail #360

Uh oh!

Conversation

nikomatsakis commented Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikomatsakis commented Jun 10, 2017

Uh oh!

ghost Jun 10, 2017

Choose a reason for hiding this comment

Uh oh!

nikomatsakis Jun 10, 2017

Choose a reason for hiding this comment

Uh oh!

nikomatsakis commented Jun 10, 2017

Uh oh!

nikomatsakis commented Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bholley commented Jun 10, 2017

Uh oh!

nikomatsakis commented Jun 12, 2017

Uh oh!

bholley commented Jun 12, 2017

Uh oh!

nikomatsakis commented Jun 12, 2017

Uh oh!

nikomatsakis commented Jun 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuviper left a comment

Choose a reason for hiding this comment

Uh oh!

cuviper Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

cuviper Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

cuviper Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

cuviper Jun 12, 2017

Choose a reason for hiding this comment

Uh oh!

nikomatsakis commented Jun 12, 2017

Uh oh!

cuviper left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mock up `breadth_first` and `spawn_tail` #360

mock up `breadth_first` and `spawn_tail` #360

nikomatsakis commented Jun 10, 2017 •

edited

Loading

nikomatsakis commented Jun 10, 2017 •

edited

Loading

nikomatsakis commented Jun 12, 2017 •

edited

Loading