Vector got slower in 2.12? #260

SethTisue · 2016-11-08T20:28:37Z

lots of discussion already at vavr-io/vavr#1658

viktorklang · 2016-11-08T20:41:57Z

Wild guess: is there any change in the inheritance hierarchy / interfaces which would change complexity of method dispatch?

Ichoran · 2016-11-08T20:46:06Z

Potentially yes, but I haven't actually seen it appear as additional function calls in the C2 compiled assembly. I'll work on it more tonight. I haven't looked through comprehensively, just at my top guesses (which were not clearly wrong, but also weren't precisely the same).

Ichoran · 2016-11-09T04:59:57Z

@viktorklang - It's more related to inlining depth, I think. The biggest culprit, responsible for 85% of the slowdown, is a failure to convert the copyOf call in VectorPointer to fully inlined array operations. It directs through scala.compat.Platform and by the time it comes out into the assembly, it's not fully optimized any more (and there is at least one residual load of a MODULE$).

Still working out the best way to fix the issue.

retronym · 2016-11-09T05:20:51Z

Looking at the bytecode for copyLeft (another caller of Platform.arrayCopy, just like VectorPointer.copyOf):

  // 2.11.8
  private java.lang.Object[] copyLeft(java.lang.Object[], int);
    Code:
       0: aload_1
       1: arraylength
       2: anewarray     #328                // class java/lang/Object
       5: astore        4
       7: getstatic     #574                // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
      10: astore_3
      11: aload_1
      12: iconst_0
      13: aload         4
      15: iconst_0
      16: iload_2
      17: invokestatic  #580                // Method java/lang/System.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
      20: aload         4
      22: areturn

 // 2.12.0
  private java.lang.Object[] copyLeft(java.lang.Object[], int);
    Code:
       0: aload_1
       1: arraylength
       2: anewarray     #104                // class java/lang/Object
       5: astore_3
       6: getstatic     #607                // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
       9: iconst_0
      10: iconst_0
      11: istore        5
      13: istore        4
      15: ifnonnull     20
      18: aconst_null
      19: athrow
      20: aload_1
      21: iload         4
      23: aload_3
      24: iload         5
      26: iload_2
      27: invokestatic  #613                // Method java/lang/System.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
      30: aload_3
      31: areturn

The body of Platform.arrayCopy is inlined in both cases. Both also load the module field (I think we leave this in to ensure that the module initialization is not elided after inlining).

But, 2.12.x also includes a null check for the module. This could only be true in initialization cycles like:

scala> class A { B.b }; object B extends A { def b = 42 } ; B
java.lang.NullPointerException
  ... 35 elided

Hotspot can usually optimize away a never-taken branch.

BTW, I forget the rationale for indirecting arrayCopy through Platform. Is it just a legacy of Scala.NET?

retronym · 2016-11-09T05:24:18Z

See also #107 ("Use x.getClass as null check?")

sjrd · 2016-11-09T07:21:05Z

BTW, I forget the rationale for indirecting arrayCopy through Platform. Is it just a legacy of Scala.NET?

Most likely. FTR Scala.js doesn't care; it supports System.arrayCopy.

Ichoran · 2016-11-09T07:46:14Z

There's no null check in copyOf; it just tries to copy:

       0: aload_1
       1: arraylength
       2: anewarray     #5                  // class java/lang/Object
       5: astore_2
       6: getstatic     #129                // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
       9: aload_1
      10: iconst_0
      11: aload_2
      12: iconst_0
      13: aload_1
      14: arraylength
      15: invokevirtual #133                // Method scala/compat/Platform$.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
      18: aload_2
      19: areturn

as opposed to 2.11's

   public static final java.lang.Object[] copyOf(scala.collection.immutable.VectorPointer, java.lang.Object[]);
    Code:
       0: aload_1
       1: arraylength
       2: anewarray     #4                  // class java/lang/Object
       5: astore        4
       7: getstatic     #102                // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
      10: aload_1
      11: arraylength
      12: istore_3
      13: astore_2
      14: aload_1
      15: iconst_0
      16: aload         4
      18: iconst_0
      19: iload_3
      20: invokestatic  #108                // Method java/lang/System.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
      23: aload         4
      25: areturn

where the method has been inlined. The C2 compiler inlines both even further when it generates assembly, but the amount of extra crud elided is different in the two cases. In 2.12:

0x00007fea2528b218: rep stosb (%rdi)          ;*anewarray
                                                ; - scala.collection.immutable.VectorPointer::copyOf@2 (line 957)
                                                ; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
                                                ; - scala.collection.immutable.Vector::copyOf@2 (line 65)
                                                ; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

  0x00007fea2528b21b: movabs  $0x719ac9560,%r8  ;   {oop(a 'java/lang/Class' = 'scala/compat/Platform$')}
  0x00007fea2528b225: mov     0x68(%r8),%ebp    ;*getstatic MODULE$
                                                ; - scala.collection.immutable.VectorPointer::copyOf@6 (line 958)
                                                ; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
                                                ; - scala.collection.immutable.Vector::copyOf@2 (line 65)
                                                ; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

  0x00007fea2528b229: test    %ebp,%ebp
  0x00007fea2528b22b: je      0x7fea2528b2fd    ;*ifnonnull
                                                ; - scala.collection.immutable.VectorPointer::copyOf@18 (line 958)
                                                ; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
                                                ; - scala.collection.immutable.Vector::copyOf@2 (line 65)
                                                ; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

  0x00007fea2528b231: test    %r10d,%r10d
  0x00007fea2528b234: jle     0x7fea2528b24f
  0x00007fea2528b236: mov     %rbx,%rsi
  0x00007fea2528b239: add     $0x10,%rsi
  0x00007fea2528b23d: lea     0x10(%r12,%r11,8),%rdi
  0x00007fea2528b242: movabs  $0x7fea250525e0,%r10
  0x00007fea2528b24c: callq   %r10              ;*invokestatic arraycopy

whereas in 2.11 the roughly comparable assembly is only

  0x00007fbc95258ebc: prefetchnta 0x180(%r11)   ;*anewarray
                                                ; - scala.collection.immutable.VectorPointer$class::copyOf@2 (line 954)
                                                ; - scala.collection.immutable.Vector::copyOf@2 (line 62)
                                                ; - scala.collection.immutable.VectorPointer$class::gotoPosWritable1@14 (line 1053)

  0x00007fbc95258ec4: test    %ecx,%ecx
  0x00007fbc95258ec6: jle     0x7fbc95258ee1
  0x00007fbc95258ec8: mov     %r13,%rsi
  0x00007fbc95258ecb: add     $0x10,%rsi
  0x00007fbc95258ecf: lea     0x10(%r12,%r8,8),%rdi
  0x00007fbc95258ed4: movabs  $0x7fbc95052380,%r10
  0x00007fbc95258ede: callq   %r10              ;*invokestatic arraycopy

But I'm not sure how much of the issue this is because I can't swap assembly around.

viktorklang · 2016-11-09T07:56:42Z

Looks like it always need to go through the module in 2.12?

Cheers,
√

On Nov 9, 2016 08:46, "Ichoran" [email protected] wrote:

There's no null check in copyOf; it just tries to copy:
   0: aload_1
   1: arraylength
   2: anewarray     #5                  // class java/lang/Object
   5: astore_2
   6: getstatic     #129                // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
   9: aload_1
  10: iconst_0
  11: aload_2
  12: iconst_0
  13: aload_1
  14: arraylength
  15: invokevirtual #133                // Method scala/compat/Platform$.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
  18: aload_2
  19: areturn
as opposed to 2.11's

public static final java.lang.Object[] copyOf(scala.collection.immutable.VectorPointer, java.lang.Object[]);
Code:
0: aload_1
1: arraylength
2: anewarray #4 // class java/lang/Object
5: astore 4
7: getstatic #102 // Field scala/compat/Platform$.MODULE$:Lscala/compat/Platform$;
10: aload_1
11: arraylength
12: istore_3
13: astore_2
14: aload_1
15: iconst_0
16: aload 4
18: iconst_0
19: iload_3
20: invokestatic #108 // Method java/lang/System.arraycopy:(Ljava/lang/Object;ILjava/lang/Object;II)V
23: aload 4
25: areturn

where the method has been inlined. The C2 compiler inlines both even
further when it generates assembly, but the amount of extra crud elided is
different in the two cases. In 2.12:

0x00007fea2528b218: rep stosb (%rdi) ;*anewarray
; - scala.collection.immutable.VectorPointer::copyOf@2 (line 957)
; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
; - scala.collection.immutable.Vector::copyOf@2 (line 65)
; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

0x00007fea2528b21b: movabs $0x719ac9560,%r8 ; {oop(a 'java/lang/Class' = 'scala/compat/Platform$')}
0x00007fea2528b225: mov 0x68(%r8),%ebp ;*getstatic MODULE$
; - scala.collection.immutable.VectorPointer::copyOf@6 (line 958)
; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
; - scala.collection.immutable.Vector::copyOf@2 (line 65)
; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

0x00007fea2528b229: test %ebp,%ebp
0x00007fea2528b22b: je 0x7fea2528b2fd ;*ifnonnull
; - scala.collection.immutable.VectorPointer::copyOf@18 (line 958)
; - scala.collection.immutable.VectorPointer::copyOf$@2 (line 956)
; - scala.collection.immutable.Vector::copyOf@2 (line 65)
; - scala.collection.immutable.VectorPointer::gotoPosWritable1@14 (line 1056)

0x00007fea2528b231: test %r10d,%r10d
0x00007fea2528b234: jle 0x7fea2528b24f
0x00007fea2528b236: mov %rbx,%rsi
0x00007fea2528b239: add $0x10,%rsi
0x00007fea2528b23d: lea 0x10(%r12,%r11,8),%rdi
0x00007fea2528b242: movabs $0x7fea250525e0,%r10
0x00007fea2528b24c: callq %r10 ;*invokestatic arraycopy

whereas in 2.11 the roughly comparable assembly is only
; - scala.collection.immutable.VectorPointer$class::copyOf@2 (line 954)
; - scala.collection.immutable.Vector::copyOf@2 (line 62)
; - scala.collection.immutable.VectorPointer$class::gotoPosWritable1@14
(line 1053)

0x00007fbc95258ec4: test %ecx,%ecx
0x00007fbc95258ec6: jle 0x7fbc95258ee1
0x00007fbc95258ec8: mov %r13,%rsi
0x00007fbc95258ecb: add $0x10,%rsi
0x00007fbc95258ecf: lea 0x10(%r12,%r8,8),%rdi
0x00007fbc95258ed4: movabs $0x7fbc95052380,%r10
0x00007fbc95258ede: callq %r10 ;*invokestatic arraycopy

But I'm not sure how much of the issue this is because I can't swap assembly around.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/scala/scala-dev/issues/260#issuecomment-259352186,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAqd2iItqqzNK88tB_Tm33kO8CXsmlWks5q8XpIgaJpZM4Ks33s
.

Ichoran · 2016-11-09T07:58:49Z

The deeper issue seems to be that the new trait encoding just isn't optimized as effectively. I can restore the original performance by changing the platform call to java.util.Arrays.copyOf, which is less in need of optimization anyway. I am not sure it's worth the time right now to figure out whether that would have sped up 2.11.8 yet further.

Ichoran · 2016-11-09T08:45:11Z

Fixed performance issues, I think, in the PR at scala/scala#5516

The (not very pretty) test I ran was:

import ichi.bench._

object Speed {
  def main(args: Array[String]) {
    val th = Thyme warmed 0.03
    val v1k = Vector.tabulate(1024)(i => i)
    val fapp = th.Warm{
        var v = Vector.empty[Int]
        var i = 0
        while (i < 1024*128) {
          v = v :+ i
          i += 1
        }
        v(util.Random.nextInt(1024*128))
    }
    val ftail = th.Warm {
      var v = v1k
      var i = v1k.length - util.Random.nextInt(10) - 1
      while (i > 0) {
        v = v.tail
        i -= 1
      }
      v.head
    }
    val finit = th.Warm {
      var v = v1k
      var i = v1k.length - util.Random.nextInt(10) - 1
      while (i > 0) {
        v = v.init
        i -= 1
      }
      v.head
    }
    val fslice = th.Warm {
      var v = v1k
      val n = util.Random.nextInt(10) + 3
      while (v.length > n) {
        v = v.slice(1, v.length-2)
      }
      v.last
    }
    val fupd = th.Warm{
      var v = v1k
      val add = util.Random.nextInt(1024)
      var i = 0
      while (i < v.length) {
        v = v.updated(i, i+add)
        i += 1
      }
    }
    for (n <- 1 to 5) {
      th pbenchWarm fapp
      th pbenchWarm ftail
      th pbenchWarm finit
      th pbenchWarm fslice
      th pbenchWarm fupd
    }
  }
}

The build file was

resolvers += "Sonatype OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots"

lazy val root = (project in file(".")).settings(
  name := "thyme-test",
  version := "0.1.0",
  scalaVersion := "2.12.1-SNAPSHOT",
  libraryDependencies += "com.github.ichoran" % "thyme_2.12" % "0.1.2-SNAPSHOT"
)

for 2.12 (and a version with a jar for Thyme in /lib for 2.11). The tests were all performed with sbt assembly followed by -cp target/scala-2.12/thyme-test-assembly-0.1.0.jar Speed as a base.

lrytz · 2016-11-09T09:38:57Z

There's no null check in copyOf; it just tries to copy:

@Ichoran I just checked scala/collection/immutable/VectorPointer.class in scala-library.jar of 2.12.0, and there, Platform.arraycopy is inlined into copyOf:

  public default copyOf([Ljava/lang/Object;)[Ljava/lang/Object;
    // parameter final  a
   L0
    LINENUMBER 957 L0
    ALOAD 1
    ARRAYLENGTH
    ANEWARRAY java/lang/Object
   L1
    ASTORE 2
   L2
    LINENUMBER 958 L2
    GETSTATIC scala/compat/Platform$.MODULE$ : Lscala/compat/Platform$;
    ICONST_0
    ICONST_0
    ALOAD 1
    ARRAYLENGTH
    ISTORE 5
    ISTORE 4
    ISTORE 3
    IFNONNULL L3
    ACONST_NULL
    ATHROW
   L3
   FRAME FULL [scala/collection/immutable/VectorPointer [Ljava/lang/Object; [Ljava/lang/Object; I I I] []
    ALOAD 1
    ILOAD 3
    ALOAD 2
    ILOAD 4
    ILOAD 5
    INVOKESTATIC java/lang/System.arraycopy (Ljava/lang/Object;ILjava/lang/Object;II)V
   L4
    LINENUMBER 959 L4
    ALOAD 2
   L5
    ARETURN

We are more conservative than 2.11 with null-checks on modules, as Jason mentioned. We also don't have constant propagation, while the optimizer in 2.11 did.

retronym · 2016-11-10T03:42:18Z

Link: scala/scala#5516

SethTisue mentioned this issue Nov 8, 2016

Updated JMH, Scalaz, Clojure, and ECollections benchmark versions vavr-io/vavr#1658

Merged

SethTisue assigned Ichoran Nov 8, 2016

retronym added this to the 2.12.1 milestone Nov 10, 2016

retronym added the t:performance label Nov 10, 2016

retronym closed this as completed Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector got slower in 2.12? #260

Vector got slower in 2.12? #260

SethTisue commented Nov 8, 2016

viktorklang commented Nov 8, 2016

Ichoran commented Nov 8, 2016

Ichoran commented Nov 9, 2016

retronym commented Nov 9, 2016

retronym commented Nov 9, 2016

sjrd commented Nov 9, 2016

Ichoran commented Nov 9, 2016 •

edited

Loading

viktorklang commented Nov 9, 2016

Ichoran commented Nov 9, 2016

Ichoran commented Nov 9, 2016

lrytz commented Nov 9, 2016

retronym commented Nov 10, 2016

Vector got slower in 2.12? #260

Vector got slower in 2.12? #260

Comments

SethTisue commented Nov 8, 2016

viktorklang commented Nov 8, 2016

Ichoran commented Nov 8, 2016

Ichoran commented Nov 9, 2016

retronym commented Nov 9, 2016

retronym commented Nov 9, 2016

sjrd commented Nov 9, 2016

Ichoran commented Nov 9, 2016 • edited Loading

viktorklang commented Nov 9, 2016

Ichoran commented Nov 9, 2016

Ichoran commented Nov 9, 2016

lrytz commented Nov 9, 2016

retronym commented Nov 10, 2016

Ichoran commented Nov 9, 2016 •

edited

Loading