List Functor: mix unrolled and reverse map #135

matthewleon · 2017-11-21T21:35:55Z

Addresses #131

The relevant chunk sizes (5 for the initial list segment), (3 for the
tail-recursive remainder) were arrived at through benchmarked
experimentation, mapping a simple (_ + 1) through lists of various
sizes.

Relevant figures:
list of 1000 elems: 142.61 μs -> 36.97 μs
list of 2000 elems: 275.17 μs -> 55.33 μs
list of 10000 elems: 912.73 μs -> 208.39 μs
list of 100000 elems: 34.56 ms -> 1.24 ms

The ~30x speed increase for long lists is probably explained by the lack
of GC thrashing with this approach.

matthewleon · 2017-11-21T21:36:53Z

Will update this with details on my machine setup.

paf31 · 2017-11-22T22:11:35Z

src/Data/List/Types.purs

+-- chunked list Functor inspired by OCaml
+-- https://discuss.ocaml.org/t/a-new-list-map-that-is-both-stack-safe-and-fast/865
+-- chunk sizes determined through experimentation
+listMap :: forall a b. (a -> b) -> List a -> List b


This is exported right now, is that deliberate?

No. It isn't. Good catch.

paf31 · 2017-11-22T22:12:44Z

src/Data/List/Types.purs

+  startUnrolledMap :: Int -> List a -> List b
+  startUnrolledMap 0 (x : xs) = f x : chunkedRevMap xs
+  startUnrolledMap n (x1 : x2 : x3 : x4 : x5 : xs) =
+    f x1 : f x2 : f x3 : f x4 : f x5 : startUnrolledMap (n - 1) xs


I'm not sure if OCaml has tail calls modulo cons, but isn't this going to lead to a stack overflow for us?

This is why I've limited the "unrolled map" to 1000 iterations. It should avoid stack overflow. Are you suggesting that there is a way to make it stack overflow despite that limit?

I guess not, unless f was already on the verge of overflowing the stack. It can only make an existing problem a finite amount worse :)

paf31 · 2017-11-22T22:14:04Z

src/Data/List/Types.purs

+
+  startUnrolledMap :: Int -> List a -> List b
+  startUnrolledMap 0 (x : xs) = f x : chunkedRevMap xs
+  startUnrolledMap n (x1 : x2 : x3 : x4 : x5 : xs) =


This can probably be made faster if you hand-optimize the cases. The JS generated here isn't going to be particularly optimal. We can reuse the branches.

Yes, I thought the same thing. I'll look at it if the option you suggest with mutation isn't the way to go.

No, I just mean rewriting this so that common cases use the same branch, instead of having multiple branches which recognize lists of length >= 1 for example.

If you look at the generated JS, you'll hopefully see what I mean.

case xs of x : xs' -> ... case xs' of x' : xs'' -> ... _ -> ... _ -> ...

PureScript doesn't perform this optimization, you have to do it manually.

paf31 · 2017-11-22T22:16:18Z

Thanks!

Another thing to try is a straight traversal of the spine using a mutable cons cell, a la Scheme. I think we could make this safe. What do you think?

matthewleon · 2017-11-22T22:54:12Z

straight traversal of the spine using a mutable cons cell, a la Scheme

I will research this and see if I can come up with something crafty. Is the idea here that we jump into ST?

paf31 · 2017-11-22T23:14:02Z

I think it would be FFI instead of ST unfortunately, unless we can come up with a way to modify cons cells in ST, which would mean some sort of weird STList data type.

matthewleon · 2017-11-22T23:20:20Z

unless we can come up with a way to modify cons cells in ST, which would mean some sort of weird STList data type.

I was thinking along those lines. I'll try it and benchmark it.

matthewleon · 2017-11-22T23:30:01Z

Though upon further thought, I don't see how to do that without running into the problem of stack-safe recursion in Eff being very slow. Perhaps the saner thing to do is to try the mutable cons cell in FFI... I'll have to use JS anyway to optimize the unroll.

paf31 · 2017-11-23T00:03:08Z

Yes, agreed, it doesn't seem worth it to make the code non-portable.

matthewleon · 2017-11-23T00:08:21Z

Yes, agreed, it doesn't seem worth it to make the code non-portable.

So do you prefer that the code be largely left as-is, because this version is portable and has much better performance than what we had before, at least in Node?

Or should I go ahead and try to do the mutable cons cell + unrolling in FFI JavaScript?

The answer to me really isn't obvious, as I'm unaware of what PS's stance regarding library portability is. As it stands, this lib doesn't use any FFI at all – directly. I believe that Lazy, however, uses FFI. Which makes at least the lazy lists non-portable, no?

paf31 · 2017-11-23T00:10:43Z

Well the lazy data structure doesn't rely on the compiler internals in any way.

Here is a quick FFI experiment:

exports.mapList = function(f) {
  return function(l) {
    if (l.constructor.name === 'Nil') {
      return l;
    } else {
      var nil = require('Data.List.Types').Nil;
      var cons = require('Data.List.Types').Cons;
      var result = new cons(null, null);

      var input = l;
      var output = result;

      while (true) {
        output.value0 = f(input.value0);
        output.value1 = new cons(null, null);

        if (input.value1 instanceof nil) {
          break;
        }

        input = input.value1;
        output = output.value1;
      }

      output.value1 = nil.value;

      return result;
    }
  }
}

Performance-wise, it's about twice as fast, at the expense of being less portable:

> import Main
> import Data.List
> import Performance.Minibench 
> xs = range 1 100000

> benchWith 1000 \_ -> map (\x -> x * 2) xs
mean   = 3.26 ms
stddev = 1.78 ms
min    = 2.58 ms
max    = 53.19 ms
unit

> benchWith 1000 \_ -> mapList (\x -> x * 2) xs
mean   = 1.55 ms
stddev = 1.54 ms
min    = 1.30 ms
max    = 47.51 ms
unit

I think it's better to be portable though, so I might put this in a separate library as a curiosity, but I wouldn't merge it into lists as it is now.

matthewleon · 2017-11-23T01:19:01Z

Recapping conversation with @paf31 on Slack: Though considerable speed gains can be made by using FFI with a mutable Cons cell, the List library is currently portable, and we don't want to introduce FFI just to optimize map further than this.

paf31 · 2017-11-23T21:57:44Z

I've put my code in a repo here, along with zipWith and filter replacements.

matthewleon · 2017-11-23T22:44:26Z

Cool. To be sure, those expected speedups might decrease a fair bit once we merge in non-FFI optimizations to Data.List.

…

On Thu, Nov 23, 2017 at 4:57 PM, Phil Freeman ***@***.***> wrote: I've put my code in a repo here <https://github.com/paf31/purescript-lists-fast>, along with zipWith and filter replacements. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#135 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AARBS-HDNYnlzTOJnDqjiDyI1e28Lo4zks5s5epYgaJpZM4QmhW9> .

matthewleon · 2017-11-25T16:50:56Z

I've added benchmarking to this branch.

safareli · 2017-11-25T22:10:17Z

package-lock.json was committed in last commit

matthewleon · 2017-11-26T00:34:09Z

@safareli removed it. Thanks!

paf31 · 2017-11-30T02:58:36Z

src/Data/List/Types.purs

+-- chunk sizes determined through experimentation
+listMap :: forall a b. (a -> b) -> List a -> List b
+listMap f = startUnrolledMap unrollLimit
+  where


Nitpick: could you please indent things past the where, or move the where up to the end of the previous line?

paf31 · 2017-11-30T02:59:36Z

Thanks!

What do other people think? @garyb @hdgarrood

Especially about the potential stack overflow issue mentioned above.

garyb · 2017-11-30T15:30:42Z

@paf31 what was the stack overflow issue? Just that thing that if the stack is near-overflow already it might push it over the edge?

hdgarrood · 2017-11-30T16:13:23Z

Yeah, if f makes enough recursive calls on average, I think that could cause stack overflows. This looks really good though! If you could address @paf31's recent comments and do a little testing to check that stack-safety isn't significantly worse than it is now, I'm 👍 to merge.

Come to think of it, do we have any testing for stack overflows in map in this library already?

paf31 · 2017-11-30T17:09:08Z

To be clear, the issue is that we go from "map never grows the stack" to "map always grows the stack, but maybe not by very much". The growth should be linear in the size of the list.

natefaubion · 2017-11-30T17:10:47Z

With this, isn't it "map grows the stack by a fixed maximum amount"?

paf31 · 2017-11-30T17:13:39Z

Well startUnrolledMap calls itself recursively, but not tail-recursively, so no, I think the stack usage grows with the size of the input.

paf31 · 2017-11-30T17:49:39Z

Nested maps are also quite common. I don't think this is enough to cause a problem on its own, but it could worsen it.

The question for me is not whether this code should be included, but whether it should be the default. Do we want a default which works in every case but is slow in most cases, or one which works in most cases but overflows the stack in a tiny number of cases?

I think it's probably okay to pick this as the default, but I just want to make sure the trade off is clear now as we decide.

natefaubion · 2017-11-30T17:51:35Z

There's also the question of what the stack limit is. It does 5 items at a time, so it will eat stack up to 5000 items. Maybe we don't need to handle that many, and we can reduce it to 200 frames if it's a concern?

matthewleon · 2017-11-30T18:07:20Z

There's also the question of what the stack limit is. It does 5 items at a time, so it will eat stack up to 5000 items. Maybe we don't need to handle that many, and we can reduce it to 200 frames if it's a concern?

I wouldn't mind doing that. Maybe it's worth me doing some exploratory tests to see how much breathing room Node gives us?

matthewleon · 2017-11-30T19:33:35Z

regarding the call stack size: http://2ality.com/2014/04/call-stack-size.html

matthewleon · 2017-11-30T20:42:02Z

we can reduce it to 200 frames if it's a concern?

I've made a commit that does this. I figure that this means we still get a big speed-up for common List use-cases, while if people are serious about having faster iteration through longer lists, they can use the fast list repo.

2017 MacBook Pro 2.3 GHz Intel Core i5, 8 GB 2133 MHz LPDDR3 Node v8.9.1 List ==== map --- map: empty list mean = 1.31 μs stddev = 11.87 μs min = 799.00 ns max = 375.82 μs map: singleton list mean = 2.40 μs stddev = 11.03 μs min = 1.03 μs max = 342.18 μs map: list (1000 elems) mean = 143.41 μs stddev = 225.12 μs min = 97.16 μs max = 2.03 ms map: list (2000 elems) mean = 274.16 μs stddev = 295.84 μs min = 199.66 μs max = 2.06 ms map: list (5000 elems) mean = 531.84 μs stddev = 512.61 μs min = 229.45 μs max = 2.95 ms map: list (10000 elems) mean = 895.24 μs stddev = 777.87 μs min = 464.59 μs max = 2.94 ms map: list (100000 elems) mean = 33.45 ms stddev = 7.65 ms min = 22.07 ms max = 63.47 ms

Addresses purescript#131 The relevant chunk sizes (5 for the initial list segment), (3 for the tail-recursive remainder) were arrived at through benchmarked experimentation, mapping a simple (_ + 1) through lists of various sizes. Relevant figures: list of 1000 elems: 142.61 μs -> 36.97 μs list of 2000 elems: 275.17 μs -> 55.33 μs list of 10000 elems: 912.73 μs -> 208.39 μs list of 100000 elems: 34.56 ms -> 1.24 ms The ~30x speed increase for long lists is probably explained by the lack of GC thrashing with this approach. Benchmarked on 2017 Macbook Pro, 2.3 GHz Intel Core i5, 8 GB RAM. macOS Sierra 10.12.6 node v8.9.1

this lower the probability of stack-size troubles

matthewleon · 2017-12-04T22:23:21Z

rebased.

paf31 · 2017-12-10T23:47:24Z

Given the possible stack-safety issue, what do people think about providing the naive (safe) implementation as a separate named function? Or vice versa?

matthewleon · 2017-12-11T00:06:39Z

One possibility is to just use the stack-safe unrolled operation, which should still give a speed boost, and recommend that people turn to purescript-lists-fast if speed is important.

paf31 · 2017-12-11T00:09:00Z

Would it be possible to try that and compare the performance easily?

matthewleon · 2017-12-12T02:03:05Z

I made a branch that only uses the stack-safe reverse map and... It's actually faster.

map: list (2000 elems)
mean = 53.33 μs -> 31.24 μs

map: list (10000 elems)
mean = 107.73 μs -> 98.17 μs

I've merged it in.

matthewleon · 2017-12-12T02:05:59Z

Just realized I can also inline the unrolledMap, will do that and push.

begin with reverse unrolled map

matthewleon · 2017-12-12T02:15:26Z

Actually, inlining unrolledMap slows things down significantly. Mysteries of JS... This should be good for review.

paf31 · 2017-12-12T17:26:27Z

Even better, thanks!

Any other comments before I merge this?

hdgarrood · 2019-03-10T17:37:51Z

Merged in via #157.

matthewleon force-pushed the fast-functor branch 2 times, most recently from f6b1ae7 to 550e096 Compare November 21, 2017 21:43

paf31 reviewed Nov 22, 2017

View reviewed changes

matthewleon force-pushed the fast-functor branch from 550e096 to c93cd17 Compare November 23, 2017 01:14

matthewleon mentioned this pull request Nov 25, 2017

benchmarks paf31/purescript-lists-fast#2

Closed

matthewleon force-pushed the fast-functor branch from 29e604e to 7451586 Compare November 25, 2017 22:27

paf31 reviewed Nov 30, 2017

View reviewed changes

matthewleon added 5 commits December 4, 2017 17:22

style tweak

87c5a16

test stack-safety of strict map

677342a

lower unrolled map iteration limit

bc5d4aa

this lower the probability of stack-size troubles

matthewleon force-pushed the fast-functor branch from 659f14e to 8dff4a2 Compare December 4, 2017 22:23

restore un-exported functions from Data.List.Types

cadf481

matthewleon force-pushed the fast-functor branch 2 times, most recently from ca5d02f to cadf481 Compare December 12, 2017 01:44

matthewleon added 2 commits December 11, 2017 20:46

add failing map test

4fbe0ca

fix a logic error in List.map chunkedRevMap

e510520

make map stack safe(r) again

5e653a9

begin with reverse unrolled map

matthewleon force-pushed the fast-functor branch from 6333855 to 5e653a9 Compare December 12, 2017 02:14

hdgarrood mentioned this pull request Mar 10, 2019

fast functor updated #157

Merged

hdgarrood closed this Mar 10, 2019

hdgarrood mentioned this pull request Mar 10, 2019

A new List.map that is both stack-safe and fast #131

Closed

List Functor: mix unrolled and reverse map #135

List Functor: mix unrolled and reverse map #135

Uh oh!

Conversation

matthewleon commented Nov 21, 2017

Uh oh!

matthewleon commented Nov 21, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paf31 commented Nov 22, 2017

Uh oh!

matthewleon commented Nov 22, 2017

Uh oh!

paf31 commented Nov 22, 2017

Uh oh!

matthewleon commented Nov 22, 2017

Uh oh!

matthewleon commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paf31 commented Nov 23, 2017

Uh oh!

matthewleon commented Nov 23, 2017

Uh oh!

paf31 commented Nov 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewleon commented Nov 23, 2017

Uh oh!

paf31 commented Nov 23, 2017

Uh oh!

matthewleon commented Nov 23, 2017 via email

Uh oh!

matthewleon commented Nov 25, 2017

Uh oh!

safareli commented Nov 25, 2017

Uh oh!

matthewleon commented Nov 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

paf31 commented Nov 30, 2017

Uh oh!

garyb commented Nov 30, 2017

Uh oh!

hdgarrood commented Nov 30, 2017

Uh oh!

paf31 commented Nov 30, 2017

Uh oh!

natefaubion commented Nov 30, 2017

Uh oh!

paf31 commented Nov 30, 2017

Uh oh!

paf31 commented Nov 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

natefaubion commented Nov 30, 2017

Uh oh!

matthewleon commented Nov 30, 2017

Uh oh!

matthewleon commented Nov 22, 2017 •

edited

Loading

paf31 commented Nov 23, 2017 •

edited

Loading

matthewleon commented Nov 26, 2017 •

edited

Loading

paf31 commented Nov 30, 2017 •

edited

Loading