-
Notifications
You must be signed in to change notification settings - Fork 14
Vector got slower in 2.12? #260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Wild guess: is there any change in the inheritance hierarchy / interfaces which would change complexity of method dispatch? |
Potentially yes, but I haven't actually seen it appear as additional function calls in the C2 compiled assembly. I'll work on it more tonight. I haven't looked through comprehensively, just at my top guesses (which were not clearly wrong, but also weren't precisely the same). |
@viktorklang - It's more related to inlining depth, I think. The biggest culprit, responsible for 85% of the slowdown, is a failure to convert the copyOf call in VectorPointer to fully inlined array operations. It directs through scala.compat.Platform and by the time it comes out into the assembly, it's not fully optimized any more (and there is at least one residual load of a MODULE$). Still working out the best way to fix the issue. |
Looking at the bytecode for
The body of But, 2.12.x also includes a null check for the module. This could only be true in initialization cycles like:
Hotspot can usually optimize away a never-taken branch. BTW, I forget the rationale for indirecting |
See also #107 ("Use x.getClass as null check?") |
Most likely. FTR Scala.js doesn't care; it supports |
There's no null check in 0: aload_1
1: arraylength
2: anewarray #5 // class java/lang/Object
5: astore_2
6: getstatic #129 // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
9: aload_1
10: iconst_0
11: aload_2
12: iconst_0
13: aload_1
14: arraylength
15: invokevirtual #133 // Method scala/compat/Platform$.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
18: aload_2
19: areturn as opposed to 2.11's public static final java.lang.Object[] copyOf(scala.collection.immutable.VectorPointer, java.lang.Object[]);
Code:
0: aload_1
1: arraylength
2: anewarray #4 // class java/lang/Object
5: astore 4
7: getstatic #102 // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
10: aload_1
11: arraylength
12: istore_3
13: astore_2
14: aload_1
15: iconst_0
16: aload 4
18: iconst_0
19: iload_3
20: invokestatic #108 // Method java/lang/System.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
23: aload 4
25: areturn where the method has been inlined. The C2 compiler inlines both even further when it generates assembly, but the amount of extra crud elided is different in the two cases. In 2.12:
whereas in 2.11 the roughly comparable assembly is only
But I'm not sure how much of the issue this is because I can't swap assembly around. |
Looks like it always need to go through the module in 2.12? Cheers, On Nov 9, 2016 08:46, "Ichoran" [email protected] wrote:
|
The deeper issue seems to be that the new trait encoding just isn't optimized as effectively. I can restore the original performance by changing the platform call to |
Fixed performance issues, I think, in the PR at scala/scala#5516 The (not very pretty) test I ran was: import ichi.bench._
object Speed {
def main(args: Array[String]) {
val th = Thyme warmed 0.03
val v1k = Vector.tabulate(1024)(i => i)
val fapp = th.Warm{
var v = Vector.empty[Int]
var i = 0
while (i < 1024*128) {
v = v :+ i
i += 1
}
v(util.Random.nextInt(1024*128))
}
val ftail = th.Warm {
var v = v1k
var i = v1k.length - util.Random.nextInt(10) - 1
while (i > 0) {
v = v.tail
i -= 1
}
v.head
}
val finit = th.Warm {
var v = v1k
var i = v1k.length - util.Random.nextInt(10) - 1
while (i > 0) {
v = v.init
i -= 1
}
v.head
}
val fslice = th.Warm {
var v = v1k
val n = util.Random.nextInt(10) + 3
while (v.length > n) {
v = v.slice(1, v.length-2)
}
v.last
}
val fupd = th.Warm{
var v = v1k
val add = util.Random.nextInt(1024)
var i = 0
while (i < v.length) {
v = v.updated(i, i+add)
i += 1
}
}
for (n <- 1 to 5) {
th pbenchWarm fapp
th pbenchWarm ftail
th pbenchWarm finit
th pbenchWarm fslice
th pbenchWarm fupd
}
}
} The build file was resolvers += "Sonatype OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots"
lazy val root = (project in file(".")).settings(
name := "thyme-test",
version := "0.1.0",
scalaVersion := "2.12.1-SNAPSHOT",
libraryDependencies += "com.github.ichoran" % "thyme_2.12" % "0.1.2-SNAPSHOT"
) for 2.12 (and a version with a jar for Thyme in |
@Ichoran I just checked
We are more conservative than 2.11 with null-checks on modules, as Jason mentioned. We also don't have constant propagation, while the optimizer in 2.11 did. |
Link: scala/scala#5516 |
lots of discussion already at vavr-io/vavr#1658
The text was updated successfully, but these errors were encountered: