Skip to content

Improved runtime speed for Vector, restoring previous performance. #5516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 11, 2016

Conversation

Ichoran
Copy link
Contributor

@Ichoran Ichoran commented Nov 9, 2016

copyOf was redirected to java.util.Arrays.copyOf to reduce the work that the JIT compiler has to do to make this work well.

zeroLeft, zeroRight, copyLeft, and copyRight were made private[this] to improve performance (presumably by encouraging the JIT compiler to try harder despite using the new trait encoding scheme).

Checks of performance were conducted with Thyme to aid iteration speed.

@@ -466,24 +466,24 @@ override def companion: GenericCompanion[Vector] = Vector
display5 = copyRange(display5, oldLeft, newLeft)
}

private def zeroLeft(array: Array[AnyRef], index: Int): Unit = {
private[this] def zeroLeft(array: Array[AnyRef], index: Int): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final is also nice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either one does the trick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I reran the original tests and I'm not 100% sure it makes any difference at all. If it's important I should rerun them later, maybe with JMH. I've been trying to iterate quickly, so have been using Thyme which is more subject to fluctuations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it" meaning having the method be final, private[this], or just private like it used to be. In any case the effects are small in my hands (~5%).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bytecode for a private[this] method is exactly the same as for private, so this should not make any difference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experiment only avoiding the pre-null-ing of arrays had any performance impact (i.e. the array allocation and copying should be next to each other, otherwise the array is null-ed first).

Before:

VectorBenchmark.Slice.scala_persistent  thrpt   4193.769 ops/s

After:

VectorBenchmark.Slice.scala_persistent  thrpt  10306.275 ops/s

i.e. 2.5x faster, congrats @Ichoran

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrytz - Yeah, I see that. It does (or has--not sure about now) occasionally make a difference with fields and it was faster for me to check the benchmark than the bytecode. But it seems there was just a benchmarking fluctuation that made everything I did after the one bad benchmark seem like it was helping (a tiny bit).

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -466,24 +466,24 @@ override def companion: GenericCompanion[Vector] = Vector
display5 = copyRange(display5, oldLeft, newLeft)
}

private def zeroLeft(array: Array[AnyRef], index: Int): Unit = {
private[this] def zeroLeft(array: Array[AnyRef], index: Int): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my experiment only avoiding the pre-null-ing of arrays had any performance impact (i.e. the array allocation and copying should be next to each other, otherwise the array is null-ed first).

Before:

VectorBenchmark.Slice.scala_persistent  thrpt   4193.769 ops/s

After:

VectorBenchmark.Slice.scala_persistent  thrpt  10306.275 ops/s

i.e. 2.5x faster, congrats @Ichoran

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 9, 2016

@paplorinc - Which change specifically was the improvement? I tested the java.util.Arrays.copyOf one extensively, but the others I just threw in at the last moment as a benchmark indicated there was a small residual slowdown. It seemed to work so I left it, but it sounds like it's unnecessary and should be removed?

@retronym
Copy link
Member

HotSpot optimizes val a = new Array; arrayCopy(..., dest = a) to avoid zeroing: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/file/tip/src/share/vm/opto/macroArrayCopy.cpp

The extra null checks and/or other small differences in bytecode shapes between 2.11.8 and 2.12.0 probably hampered this optimization.

Would be interesting to see if the 2.11.8 benchmark slows down when this JIT optimization is disabled (-XX:-ReduceBulkZeroing.)

Platform.arraycopy(a, 0, b, 0, a.length)
b
}
private[immutable] final def copyOf(a: Array[AnyRef]) = java.util.Arrays.copyOf(a, a.length)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go through:

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
        @SuppressWarnings("unchecked")
        T[] copy = ((Object)newType == (Object)Object[].class)
            ? (T[]) new Object[newLength]
            : (T[]) Array.newInstance(newType.getComponentType(), newLength);
        System.arraycopy(original, 0, copy, 0,
                         Math.min(original.length, newLength));
        return copy;
    }

It might be better to just call new Array[AnyRef]; System.arrayCopy here to avoid passing through the generic code in copyOf, unless we're sure that HotSpot can elide the array zeroing for it as well (in cases where it has been used by other callers so both branches of the conditional are live.)

Copy link
Contributor Author

@Ichoran Ichoran Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retronym - On my machine, the copyOf version is actually better-optimized than the new Array/arraycopy. I don't see anything left in the assembly of the object type test, and it looks a lot like the 2.11.8 version.

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

@retronym - Keeping bulk zeroing does make a big difference on 2.11--about 50% on my machine, which is a majority of the slowdown observed. There are measurable residual effects, but those are a few percent at most. 2.12 with the copyOf = java.util.Arrays.copyOf fix, plus the inlined System.arraycopy, works the same way, while 2.12.0 is barely affected by -XX:-ReduceBulkZeroing.

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

Here are some benchmark numbers:

Throughput (- means zeroing elision disabled)
---------------------------------------------
Test:      :+    tail   init  slice updated
2.11.8    100%   100%   100%   100%   100%
2.11.8-    71%    71%    74%    69%    72%
2.12.0     69%    70%    75%    67%    71%
2.12.0-    68%    70%    75%    67%    71%
patch      98%   100%   100%    97%    99%  
patch-     69%    71%    74%    69%    72%
+copyOf    98%    90%    96%    91%    99%
-Platform  98%    99%    99%    97%   100%

Error is about +- 0.5%.

copyOf is just changing copyOf to forward to java.util.Arrays.copyOf, while -Platform is replacing all Platform.arraycopy calls with java.lang.System.arraycopy.

I don't think there's a meaningful difference between the two (I don't think I was systematic enough before), so I think @retronym is right that we should just stick to new Array/System.arraycopy.

All calls to Platform.arraycopy were rewritten as java.lang.System.arraycopy to reduce the work that the JIT compiler has to do to produce optimized bytecode that avoids zeroing just-allocated arrays that are about to be copied over.

(Tested with -XX:-ReduceBulkZeroing as suggested by retronym.)
@Ichoran Ichoran force-pushed the non-slow-vector-twelve branch from 3f70b46 to e5fd42d Compare November 10, 2016 02:03
@retronym
Copy link
Member

Here are some more instances of the same pattern that could be changed.

val newarray = new Array[AnyRef](len)
scala.compat.Platform.arraycopy(array, 0, newarray, 0, size0)

scala.compat.Platform.arraycopy(array, 0, newArray, 0, size0)

val arr = new Array[Type](elems.length + offset)
scala.compat.Platform.arraycopy(elems, 0, arr, offset, elems.length)

val newchrs = new Array[Char](chrs.length * 2)
scala.compat.Platform.arraycopy(chrs, 0, newchrs, 0, chrs.length)

I'm also wondering whether we should make the inliner more aggressive (eliminated the null check and maybe even the module load) for objects inlined from some whitelist, e.g scala.compat._, scala.Predef._)

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

@retronym - I was just about to mention the library ones there, plus one in mutable.PriorityQueue also that is not exactly the same pattern but could become it if optimized/inlined enough. Not sure if I should fix them all (just by doing Platform -> System) as part of this patch? I guess there's little harm in it.

@retronym
Copy link
Member

+1 for inlining all callers of Platform.arrayCopy.

@retronym
Copy link
Member

Thanks for the running those benchmarks and presenting the results so clearly!

…arraycopy

to avoid the same kind of slowdowns that Vector was experiencing due
to the less aggressive inlining by scalac.
@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

@retronym - Got them all, I think.

@retronym
Copy link
Member

LGTM

@retronym retronym self-assigned this Nov 10, 2016
Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that these changes solve the issue, I think this is a lot of workaround for a known root cause (i.e. symptomatic treatment for the Scala inliner working differently).

If we solve this locally, the clients of @inline will still be affected.

Since there is a very specific test for the inlining of Platform.arrayCopy, I suggest we look there for why it's passing:

@@ -102,7 +101,7 @@ object Array extends FallbackArrayBuilding {
def copy(src: AnyRef, srcPos: Int, dest: AnyRef, destPos: Int, length: Int) {
val srcClass = src.getClass
if (srcClass.isArray && dest.getClass.isAssignableFrom(srcClass))
arraycopy(src, srcPos, dest, destPos, length)
java.lang.System.arraycopy(src, srcPos, dest, destPos, length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this probably isn't affected anyway, as the array that gets here is already null-ed out.
But at least we're consistent :) (in which case Platform.arraycopy should probably be removed).
Also note, that in other cases (e.g.

System.arraycopy(b, offset, dest, 0, len)
) the System reference is not fully qualified.

@@ -955,7 +954,7 @@ private[immutable] trait VectorPointer[T] {

private[immutable] final def copyOf(a: Array[AnyRef]) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think that the signature of copyLeft, copyRight, copyRange and copyOf should be the same (except for the name, of course), so maybe consider removing the [immutable] final modifiers here (or add them to the rest also)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going for a bug fix with no visible changes (and binary compatibility), so while I agree that it's inconsistent, I don't think this is the time to change it.

@@ -478,12 +477,12 @@ override def companion: GenericCompanion[Vector] = Vector
// if (array eq null)
// println("OUCH!!! " + right + "/" + depth + "/"+startIndex + "/" + endIndex + "/" + focus)
val a2 = new Array[AnyRef](array.length)
Platform.arraycopy(array, 0, a2, 0, right)
java.lang.System.arraycopy(array, 0, a2, 0, right)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

a2
}
private def copyRight(array: Array[AnyRef], left: Int): Array[AnyRef] = {
val a2 = new Array[AnyRef](array.length)
Platform.arraycopy(array, left, a2, left, a2.length - left)
java.lang.System.arraycopy(array, left, a2, left, a2.length - left)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -955,7 +954,7 @@ private[immutable] trait VectorPointer[T] {

private[immutable] final def copyOf(a: Array[AnyRef]) = {
val b = new Array[AnyRef](a.length)
Platform.arraycopy(a, 0, b, 0, a.length)
java.lang.System.arraycopy(a, 0, b, 0, a.length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -331,8 +331,8 @@ sealed class PriorityQueue[A](implicit val ord: Ordering[A])
val pq = new PriorityQueue[A]
val n = resarr.p_size0
pq.resarr.p_ensureSize(n)
java.lang.System.arraycopy(resarr.p_array, 1, pq.resarr.p_array, 1, n-1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will avoid nulling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not now, but it's more amenable to future optimization heuristics if there are no intervening instructions.

@@ -101,7 +101,7 @@ trait ResizableArray[A] extends IndexedSeq[A]
if (newSize > Int.MaxValue) newSize = Int.MaxValue

val newArray: Array[AnyRef] = new Array(newSize.toInt)
scala.compat.Platform.arraycopy(array, 0, newArray, 0, size0)
java.lang.System.arraycopy(array, 0, newArray, 0, size0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@l0rinc l0rinc Nov 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to change all Platform.arrayCopy references, consider changing line 120 also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I thought I did.

@@ -100,7 +100,7 @@ trait BaseTypeSeqs {

def copy(head: Type, offset: Int): BaseTypeSeq = {
val arr = new Array[Type](elems.length + offset)
scala.compat.Platform.arraycopy(elems, 0, arr, offset, elems.length)
java.lang.System.arraycopy(elems, 0, arr, offset, elems.length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -68,7 +68,7 @@ trait Names extends api.Names {
while (i < len) {
if (nc + i == chrs.length) {
val newchrs = new Array[Char](chrs.length * 2)
scala.compat.Platform.arraycopy(chrs, 0, newchrs, 0, chrs.length)
java.lang.System.arraycopy(chrs, 0, newchrs, 0, chrs.length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -220,7 +220,7 @@ trait Names extends api.Names {

/** Copy bytes of this name to buffer cs, starting at position `offset`. */
final def copyChars(cs: Array[Char], offset: Int) =
scala.compat.Platform.arraycopy(chrs, index, cs, offset, len)
java.lang.System.arraycopy(chrs, index, cs, offset, len)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, no allocation here.
Unless you decide to remove Platform.arraycopy completely, I don't think the changes where there's no allocation, just a copy are worth the trouble :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove Platform.arraycopy in 2.12.1 for compatibility constraints. But I support the move to uniformly avoid using it, which is a bit simpler than doing so selectively for particular code patterns.

@l0rinc
Copy link
Contributor

l0rinc commented Nov 10, 2016

@Ichoran, please update the title and description of the PR to reflect the new changes also :)

@retronym
Copy link
Member

Since there is a very specific test for the inlining of Platform.arrayCopy, I suggest we look there for why it's passing:

The call is inlined, but there is an additional check that Platform$.MODULE$ is not-null (ie, its super calls has completed). This is actually a step forward for correctness, but in this particular case hinders performance. It is worth revisiting, but I think the manual inlining solution here is pragmatic, especially given that Platform._ seems to be a remnant from the days of Scala.NET.

@viktorklang
Copy link
Contributor

Perhaps a dumb question, but with 2.12 being Java8+ wouldn't invokedynamic
with switchpoint be possibly better than always checking nullness?

Cheers,

On Nov 10, 2016 12:12, "Jason Zaugg" [email protected] wrote:

Since there is a very specific test for the inlining of
Platform.arrayCopy, I suggest we look there for why it's passing:

The call is inlined, but there is an additional check that
Platform$.MODULE$ is not-null (ie, its super calls has completed). This
is actually a step forward for correctness, but in this particular case
hinders performance. It is worth revisiting, but I think the manual
inlining solution here is pragmatic, especially given that Platform._
seems to be a remnant from the days of Scala.NET.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#5516 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAAqd0rBmj5XQnBKK2IjcJPgM-7-qLb7ks5q8vwXgaJpZM4KtSmX
.

@l0rinc
Copy link
Contributor

l0rinc commented Nov 10, 2016

I think there are two issues here that could probably be addressed separately:

  • The inliner behaves differently and - for my untrained mind - slightly unintuitively: I would expect it to behave as Intellij Idea's inline method, i.e. in case of a ("static") void to simply replace the call with the content of the method, i.e. it should resemble the change that @Ichoran did manually, without any additional checks whatsoever in this case (the bytecode seems to contain 12 additional instructions).
    Users expect it to be for free.
    If given some guidelines, I would gladly implement this.
  • Even if the inliner would be working as it was previously, we should get rid of these repetitive methods in Platform, as apparently their only task is to get inlined anyway. Therefore I suggest we deprecate Platform's arraycopy in this commit and replace all of its occurrences :)

@retronym
Copy link
Member

retronym commented Nov 10, 2016

@inline is supposed to preserve the semantics. That means, throw an NPE if the receiver of an inlined virtual call is null, and execute the expression that yields the receiver. I agree it would be nice to have a way to omit these effects. Mostly this is a UI problem, the backend changes should be easy enough. Maybe @lrytz has give that some thought?

Deprecation of this method (and maybe Platform altogether) may well be the best course of action.

That said, our first priority is to fix the performance regression in 2.12.1, so I'm happy with this PR in it current state to meet that goal.

@retronym
Copy link
Member

Perhaps a dumb question, but with 2.12 being Java8+ wouldn't invokedynamic with switchpoint be possibly better than always checking nullness?

Null checks that are always false are basically free in C2 compiled code. This case is really quite special, the null check itself isn't expensive, but it changes the bytecode shape for the JIT optimization of array instantation/copy.

@lrytz
Copy link
Member

lrytz commented Nov 10, 2016

simply replace the call with the content of the method

Yes, that should indeed be the case. But we have to make sure the inliner preserves semantics, so a null check on the receiver is necessary, unless the compiler can prove that the receiver cannot be null.

In the case of a module like object Platform, there's even more: when performing inlining, the compiler does not know if the Platform module is already initialized, i.e., its constructor side-effects were executed. So the module load is also there for that reason.

A simple analysis on module classfiles to know that their initialization doesn't have side-effects (only methods, no constructor statements, superclass Object) is actually planned, and would allow eliminating the module load (this analysis would also ensure that the module load is never null). scala/scala-dev#16.

Related, there's also scala/scala-dev#112.

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

Why must an inliner preserve incidental semantics of side-effects not directly referenced in the method itself?

I would be totally fine with this giving 35 and not printing anything:

class Foo { println("I just made a foo!"); @inline times(a: Int, b: Int) = a*b }
val f: Foo = null
f.times(5, 7)

@l0rinc
Copy link
Contributor

l0rinc commented Nov 10, 2016

I agree with @Ichoran, people would expect a lot simpler behavior for such a simple concept.
Maybe rather add a force parameter to the annotation that ignores any such checks, since this annotation is meant for forcing an optimization anyway?

@lrytz
Copy link
Member

lrytz commented Nov 10, 2016

Why must an inliner preserve incidental semantics

I think it should, this seems to be also a core principle of the JVM. Making code execute in a way that is not according to the language spec can lead to bugs / behaviors that are surprising / hard to understand. Of course I'm all for optimizing more, but I think we should keep to the spec. I think there's a lot of room to do more even with this restriction.

It's true that we're not very consistent at the moment either: scala/scala-dev#112 is an example. But for example we also defer class loading (when inlining a static method, or eliminating unreachable code), which can cause observable differences (defer side-effects, eliminate deadlocks). I think we never really discussed and defined precisely where to draw the line.

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 10, 2016

The problem with preserving incidental semantics is that they're incidental. Very often you don't care whether a module is initialized or not. You just want to execute some code, and there is no way to run it without dragging in module initialization. If there is no way to avoid incidentals that you don't want, you're forced to do silly time-wasting stuff like inline code by hand, a purely mechanical process that you're going through because your language won't do it for you.

I agree that it's tricky to know what is incidental and what is intended, and I also agree that it can be very surprising when behavior changes based upon inlining or not (because of incidental behavior). But it's also important, as this slowdown has illustrated.

@retronym
Copy link
Member

I'm happy that the followup ideas to improve the optimizer are covered by those tickets @lrytz linked to, so I'm going to merge this one. Thanks for your detailed review, @paplorinc!

@retronym retronym merged commit e6cb4d9 into scala:2.12.x Nov 11, 2016
@SethTisue SethTisue added the release-notes worth highlighting in next release notes label Nov 11, 2016
@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 11, 2016

@retronym - I hadn't had a chance to catch a Platform that somehow I missed in ResizableArray, but I guess that's not actually worth waiting for. Not sure that copy method is ever used, and if it is, it's not necessarily going to be next to an allocation. (I should have just used sed rather than an interactive editor....)

@l0rinc
Copy link
Contributor

l0rinc commented Nov 11, 2016

I've checked the rest of the affected methods (i.e. not just slice) and most seem to be very close to their previous performance now! Congrats @Ichoran!
The only one in Vector that I found to be measurably slower is groupBy, by 80-90%, but that might still be acceptable.

Using the Javaslang collection benchmarks with
2.11.8

Benchmark                                (CONTAINER_SIZE) Mode   Score      Error     Units
VectorBenchmark.GroupBy.scala_persistent 100              thrpt  215444.420 ±9500.270 ops/s

and
2.12.1-SNAPSHOT

Benchmark                                (CONTAINER_SIZE) Mode   Score      Error     Units
VectorBenchmark.GroupBy.scala_persistent 100              thrpt  173771.549 ±2979.146 ops/s

Running the rest of the benchmarks, I noticed that MutableList.update got considerably slower also, i.e.:
2.11.8

Benchmark                          (CONTAINER_SIZE)  Mode   Score    Error Units
ListBenchmark.Update.scala_mutable            1000  thrpt 979.478 ±9.109 ops/s

and
2.12.1-SNAPSHOT

Benchmark                          (CONTAINER_SIZE)  Mode   Score  Error Units
ListBenchmark.Update.scala_mutable            1000  thrpt 520.756 ±7.598 ops/s

I will look into these in the weekend :)

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 12, 2016

@paplorinc @lrytz - There are definitely other problems out there. GroupBy is slow in general. I haven't yet tracked down whether it's just builders or whether it's mutable map getOrElseUpdate also, but there's definitely a builder component because the slowdown is a function of the collection type even when there aren't very many distinct keys (so it's mostly key lookup and adding to builders). I'll write a ticket when I have time, and try to fix it this weekend.

If it turns out to be more optimizer stuff, I worry that we're going to be putting out small fires all over the place until we have a more aggressive inlining strategy.

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 13, 2016

@paplorinc - I have the groupBy reported at https://issues.scala-lang.org/browse/SI-10049

Please comment there if you have details to add

@l0rinc
Copy link
Contributor

l0rinc commented Nov 13, 2016

Thanks, will sign up and comment.

Until now I've found that HashMap.getOrElseUpdate is a lot slower now (e.g. 25 vs 36 million ops/s), but even HashMap.get is a lot slower (e.g. 100 vs 160 million ops/s), which may be the source of the error - and may have more serious consequences.

Adding a LinuxPerfAsmProfiler (or JProfiler also) seems to indicate that HashTable.HashUtils#improve may be the cause, but I have no clue why (may be a trait inlining issue?).

@Ichoran
Copy link
Contributor Author

Ichoran commented Nov 13, 2016

@paplorinc - Let's continue on the ticket. I think there must be some details that differ, as I don't see a slowdown in get. But maybe it is at fault in part, but the optimizer can handle it in my case but not in yours (based on what should be unimportant differences in details).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants