-
Notifications
You must be signed in to change notification settings - Fork 21
Substantial slowdown in groupBy (all collections) #10049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Imported From: https://issues.scala-lang.org/browse/SI-10049?orig=1 |
@Ichoran said (edited on Nov 13, 2016 9:07:18 PM UTC): |
@paplorinc said: |
@paplorinc said: Creating an @Benchmark
public void scala_persistent_some(Blackhole bh) {
for (int i = 0; i < CONTAINER_SIZE; i++) {
Option<Object> value = map.get(i);
assert !value.isEmpty();
bh.consume(value);
}
}
@Benchmark
public void scala_persistent_none(Blackhole bh) {
for (int i = CONTAINER_SIZE; i < 2 * CONTAINER_SIZE; i++) {
Option<Object> value = map.get(i);
assert value.isEmpty();
bh.consume(value);
}
} resulting in: 2.11.8
Impl Score
scala_persistent_none 77,367.12
scala_persistent_some 47,659.77
2.12.0
Impl Score
scala_persistent_none 47,823.76
scala_persistent_some 33,916.23 Note: the asserts were validated in a separate, non-benchmarked run! |
@paplorinc said (edited on Nov 13, 2016 11:00:57 PM UTC): 2.12.0 | 2.11.8
|
; - s.c.m.HashMap ::table@1 (line 40) | ; - s.c.m.HashMap ::table@1 (line 40)
; - s.c.m.HashTable ::index@1 (line 367) | ; - s.c.m.HashTable$class ::index@1 (line 364)
; - s.c.m.HashTable ::index$@2 (line 366) | ; - s.c.m.HashMap ::index@2 (line 40)
; - s.c.m.HashMap ::index@2 (line 40) |
; - s.c.m.HashTable ::findEntry@10 (line 132) | ; - s.c.m.HashTable$class ::findEntry@10 (line 132)
; - s.c.m.HashTable ::findEntry$@2 (line 131) |
; - s.c.m.HashMap ::findEntry@2 (line 40) | ; - s.c.m.HashMap ::findEntry@2 (line 40)
; - s.c.m.HashMap ::get@2 (line 70) | ; - s.c.m.HashMap ::get@2 (line 70) i.e. it seems to me that the |
@Ichoran said: val x = get(key)
if (x eq None) { val d = op; this(key) = d; d }
else x.asInstanceOf[Some[V]].get In my hands, it's full speed or faster, either when there are lots of successful lookups, or when new keys are added almost every time. |
@retronym said: I've played with a few variations in retronym/compiler-benchmark@a2260bb I don't have a clear model about what's going on here. One observation is that the new trait encoding requires some additional indirections which can eat into HotSpot's inlining depth budget, which is by default 9. I'm trying things like looking at the effect of |
@retronym said: Unfortunately we had to introduce some indirection in the dispatch of trait methods. Gory details: scala/scala#5177 Here's the inlining log for my benchmark of info] @ 8 scala.collection.mutable.HashMap::get (29 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashMap::findEntry (6 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashTable::findEntry$ (6 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashTable::findEntry (19 bytes) inline (hot)
[info] @ 5 scala.collection.mutable.HashMap::elemHashCode (6 bytes) inline (hot)
[info] \-> TypeProfile (577289/577289 counts) = scala/collection/mutable/HashMap
[info] @ 2 scala.collection.mutable.HashTable$HashUtils::elemHashCode$ (6 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashTable$HashUtils::elemHashCode (5 bytes) inline (hot)
[info] @ 1 scala.runtime.Statics::anyHash (65 bytes) inline (hot)
[info] @ 61 java.lang.Integer::hashCode (8 bytes) inline (hot)
[info] @ 4 java.lang.Integer::hashCode (2 bytes) inlining too deep
[info] @ 10 scala.collection.mutable.HashMap::index (6 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashTable::index$ (6 bytes) inline (hot)
[info] @ 2 scala.collection.mutable.HashTable::index (34 bytes) inline (hot)
[info] @ 1 scala.collection.mutable.HashMap::table (5 bytes) accessor
[info] @ 13 scala.collection.mutable.HashMap::seedvalue (5 bytes) accessor
[info] @ 18 scala.collection.mutable.HashMap::improve (7 bytes) inline (hot)
[info] @ 3 scala.collection.mutable.HashTable$HashUtils::improve$ (7 bytes) inline (hot)
[info] @ 3 scala.collection.mutable.HashTable$HashUtils::improve (27 bytes) inlining too deep
[info] @ 26 java.lang.Integer::bitCount (49 bytes) (intrinsic)
[info] @ 15 scala.collection.mutable.HashTable::findEntry0 (44 bytes) inline (hot)
[info] @ 1 scala.collection.mutable.HashMap::table (5 bytes) accessor
[info] @ 15 scala.collection.mutable.DefaultEntry::key (5 bytes) accessor
[info] \-> TypeProfile (38245/38245 counts) = scala/collection/mutable/DefaultEntry
[info] @ 21 scala.collection.mutable.HashMap::elemEquals (7 bytes) inline (hot)
[info] @ 3 scala.collection.mutable.HashTable::elemEquals$ (7 bytes) inline (hot)
[info] @ 3 scala.collection.mutable.HashTable::elemEquals (12 bytes) inline (hot)
[info] @ 2 scala.runtime.BoxesRunTime::equals (13 bytes) inline (hot)
[info] @ 9 scala.runtime.BoxesRunTime::equals2 (52 bytes) too big
[info] @ 30 scala.collection.mutable.DefaultEntry::next (5 bytes) inline (hot)
[info] @ 1 scala.collection.mutable.DefaultEntry::next (5 bytes) accessor
[info] @ 22 scala.collection.mutable.DefaultEntry::value (5 bytes) accessor
[info] @ 25 scala.Some::<init> (10 bytes) inline (hot)
[info] @ 6 scala.Option::<init> (9 bytes) inline (hot)
[info] @ 1 java.lang.Object::<init> (1 bytes) inline (hot)
[info] @ 5 scala.Product::$init$ (1 bytes) inline (hot) Hot calls into higher level methods (like That HotSpot limit is known to be a bit on the low side for historical reasons. The implementation of "lambda forms" in the java.lang.invoke implementation ran into it, but rather than raising it added special cases into HotSpot for lambda form frames. |
@paplorinc said (edited on Nov 14, 2016 8:20:23 PM UTC): public static class Test extends Base {
scala.collection.mutable.HashMap<Object, Object> map = HashMap$.MODULE$.empty();
@State(Scope.Thread)
public class Initialized {
@Setup(Level.Invocation)
public void initializeMutable(Base state) {
for (int i = 0; i < state.CONTAINER_SIZE; i++) {
map.put(i, i);
}
}
@TearDown(Level.Invocation)
public void tearDown() {
map.clear();
}
}
@Benchmark
public void scala_persistent_some(Blackhole bh) {
for (int i = 0; i < CONTAINER_SIZE; i++) {
Object value = map.getOrElseUpdate(i, () -> 0);
bh.consume(value);
}
assert map.size() == CONTAINER_SIZE;
}
} Note: I prefer testing |
@paplorinc said (edited on Nov 14, 2016 8:20:02 PM UTC): A viable fix would be to change |
@Ichoran said: However, the best I've got so far is at https://gist.github.com/Ichoran/94b4698905b905934dfd644f6cef7b4b This is using Thyme (now up in a snapshot for 2.11 and 2.12) so I can iterate quickly on ideas. (Note--some machines may need the |
@paplorinc said (edited on Nov 14, 2016 8:21:41 PM UTC): While the underlying problem has rather been avoided (i.e. why is It's my first Scala PR, all comments are very welcome! :) |
@paplorinc said: |
@retronym said: |
@paplorinc said: |
@paplorinc said (edited on Nov 24, 2016 10:04:45 AM UTC): |
@retronym said: |
@paplorinc said (edited on Nov 30, 2016 10:02:53 AM UTC): |
@paplorinc said: Also https://github.com/paplorinc/scala/blob/460fa9d2af7ac12d5c6b0b935a02db30dc5d7cd8/src/library/scala/collection/immutable/HashMap.scala#L83-L83 and https://github.com/paplorinc/scala/blob/cb1a4524d2b34605232afa083dd43f0b7d39b7a7/src/library/scala/collection/immutable/HashSet.scala#L180-L180 have a different hashing algorithm, mentioned in https://github.com/paplorinc/scala/blob/1ae160037b1d2b603a20595caa44ad2ce3b8dcd6/src/library/scala/collection/mutable/HashTable.scala#L455-L455 as "quick, but bad for sequence 0-10000 - little entropy in higher bits". Is this ok, or should I open an issue for it? |
@retronym said: |
@paplorinc said: Also, I noticed that the hashing does a manual bit rotation, changed that in a separate PR also: scala/scala#5569 |
groupBy
is substantially slower in 2.12, with throughput down to as little as 60% of 2.11 on my machine. This seems to affect all collections (Vector, List, and Array tested).Manually inlining the
groupBy
code does not seem to help. Indeed, benchmarking ofHashMap.getOrElseUpdate
shows a considerable decrease in throughput, possibly enough to account for the entire effect.AnyRefMap
does not appear to be affected.Manually inlining the
get
, match, andupdate
calls fromgetOrElseUpdate
does appear to solve the issue (at least mostly).The old bytecode in
scala.collection.mutable.MapLike$class
was:Now, however, the bytecode is as follows, with extra instructions highlighted with
>>>>>
:Whether these extra jumps themselves are the extra cost, or whether they prevent some JIT optimization, I have not yet investigated. However, the bytecode appears less efficiently organized (note the swap of the exception vs return which requires one of the extra jumps in the success case).
The text was updated successfully, but these errors were encountered: