drewhamilton/Poko

Do an optimization pass on platform-specific output

Opened this issue · 6 comments

We want to optimize for a sweet spot between performance and code size. This is mostly a placeholder issue until #142 is sufficiently resolved that this becomes a priority and the research is done to see where any problems may lie.

Here's a quick one.

The toString() function is generated like this (in JS):

  protoOf(Simple_0).toString = function () {
    return 'Simple(int=' + this.int_1 + ', requiredString=' + this.requiredString_1 + ', optionalString=' + this.optionalString_1 + ')';
  };

Perfectly reasonable for JS.

On the JVM and Android, however, strings are used both for string literals but also for the names of types, methods, and fields. This means that the backing fields for the int, requiredString, and optionalString cause strings with those contents to be present in the classfile. By using a string like ", requiredString=" we are duplicating 100% of the UTF-8 bytes of "requiredString". This is doubly true for "Simple(int=".

$ javap -v poko-tests/build/classes/kotlin/jvm/main/poko/Simple.class
Classfile /Volumes/dev/other/Poko/poko-tests/build/classes/kotlin/jvm/main/poko/Simple.class
  ⋮
    #9 = Utf8               requiredString
   #10 = String             #9            // requiredString
  ⋮
   #69 = Utf8               , requiredString=
   #70 = String             #69           // , requiredString=

Despite being counter-intuitive, it's actually more efficient to generate code which looks like this

return "Simple" + '(' + "int" + '=' + int + ", " + "requiredString" + '=' + ...

This will allow sharing the string bytes with the type and field names, and it means the ", " string will be de-duplicated as well. We pay for this in a bytecode count increase, but it should still result in an overall reduction of binary size. There's also a very, very tiny decrease in performance (more method calls), but it should be almost impossible to notice without precise JMH benchmarks.

Generating this is tricky, because you have to avoid any steps that join constants together. I have no idea if we can do it in IR, but I would suspect it's possible. It may also only be beneficial to do this for JVM targets. I suspect a good JS minifier will combine the constant strings back together (which is good, size is more important), but I have no idea what the correct thing to do for native targets is.

I previously did this optimization on View Binding's generated code: https://jakewharton.com/the-economics-of-generated-code/#string-duplication. It'd be another selling point of Poko over data classes.

Separately, seeing if StringConcatFactory can be used on Java 9+ or ObjectMethods can be used on Java 16+ is also worth a look.

Another one that jumps out in JS:

    if (!(other instanceof Complex_0))
      return false;
    var tmp0_other_with_cast = other instanceof Complex_0 ? other : THROW_CCE();

There's no need to do a double instanceof. This is the result of IR which does a type check separately from doing a cast. On some platforms like the JVM this is required (instanceof vs. checkcast bytecodes). In JS, however, it very clearly is a wasteful operation. Perhaps this is simply a Kotlin bug?

Anyway, I've seen this before in the JS and have never been able to eliminate it when compiling from Kotlin source. We are in IR, however, and can possibly eliminate it.

We appear to be invoking Int.hashCode() when creating hash codes.

The Java bytecode has this:

         0: aload_0
         1: getfield      #23                 // Field int:I
         4: invokestatic  #50                 // Method java/lang/Integer.hashCode:(I)I

Integer.hashCode is an identity function–it simply returns its input. There's no need to invoke it at all, just use the integer value directly. Save a few bytes!

Equality checking has suboptimal control flow.

The Java bytecode:

        21: aload_0
        22: getfield      #23                 // Field int:I
        25: aload_2
        26: getfield      #23                 // Field int:I

        29: if_icmpeq     34
        32: iconst_0
        33: ireturn

        34: aload_0
        35: getfield      #26                 // Field requiredString:Ljava/lang/String;
        38: aload_2
        39: getfield      #26                 // Field requiredString:Ljava/lang/String;

        42: invokestatic  #42                 // Method kotlin/jvm/internal/Intrinsics.areEqual:(Ljava/lang/Object;Ljava/lang/Object;)Z
        45: ifne          50
        48: iconst_0
        49: ireturn

        50: aload_0
        51: getfield      #29                 // Field optionalString:Ljava/lang/String;
        54: aload_2
        55: getfield      #29                 // Field optionalString:Ljava/lang/String;

        58: invokestatic  #42                 // Method kotlin/jvm/internal/Intrinsics.areEqual:(Ljava/lang/Object;Ljava/lang/Object;)Z
        61: ifne          66
        64: iconst_0
        65: ireturn

        66: iconst_1
        67: ireturn

I have manually separated these into what I hope is somewhat clear operations. It's successive sets of loading two corresponding values, doing a comparison, and then maybe returning false. Finally, we return true.

The control flow graph looks somewhat like this:

    21-26 aload/getfields

    29 if_icmpeq 34 ─┐
    32 iconst_0      │
◄── 33 ireturn       │
 ┌───────────────────┘
 └► 34-39 aloads/getfields

    42 invokestatic
    45 ifne 50 ─┐
    48 iconst_0 │
◄── 49 ireturn  │
 ┌──────────────┘
 └► 50-55 aloads/getfields

    58 invokestatic
    61 ifne 66 ─┐
    64 iconst_0 │
◄── 65 ireturn  │
 ┌──────────────┘
 └► 66 iconst_1
◄── 67 ireturn

I see two things wrong with this:

  1. The "return false" branches are duplicated creating duplicate bytecodes.
  2. There is no real happy path which creates a lot of jumps.

Now, I'm not sure which is the happy path: equals or non-equals, but it doesn't really matter. We should pick one and optimize for it.

I am going to pick the equals path as being the happy path under the assumption that we want to optimize for hash-based collection usage. These will use the hash code to find the bucket and then do an equals to ensure it found a match or a collision requiring another check. An ideal hash map has no hash collisions, so this equals check should always succeed.

I also think equals is worth optimizing for simply because it will produce more compact bytecode, which should look something like this:

    21-26 aload/getfields

    29 if_icmpne 66 ───────┐
                           │
    32-37 aloads/getfields │
                           │
    40 invokestatic        │
    43 ifeq 66 ────────────┤
                           │
    45-50 aloads/getfields │
                           │
    53 invokestatic        │
    56 ifeq 66 ────────────┤
                           │
    58 iconst_1            │
◄── 59 ireturn             │
 ┌─────────────────────────┘
 └► 66 iconst_0
◄── 61 ireturn

Now the equals case flows linearly from top to return true–no jumps! In the case of a mismatch, all cases jump to a single return false. This should be faster for just about every case (except a first field mismatch which was faster in the old setup).

Content-based array IR uses inefficient property access causing a performance problem.

Consider just hashCode in JS:

  protoOf(GenericArrayHolder).hashCode = function () {
    var tmp;
    if (this.generic_1 == null) {
      tmp = 0;
    } else {
      var tmp_0;
      var tmp_1 = this.generic_1;
      if (!(tmp_1 == null) ? isArray(tmp_1) : false) {
        tmp_0 = contentDeepHashCode(this.generic_1);
      } else {
        var tmp_2 = this.generic_1;
        if (!(tmp_2 == null) ? isBooleanArray(tmp_2) : false) {
          tmp_0 = contentHashCode_6(this.generic_1);
        } else {
          var tmp_3 = this.generic_1;
          if (!(tmp_3 == null) ? isCharArray(tmp_3) : false) {
            tmp_0 = contentHashCode_5(this.generic_1);
          } else {
            var tmp_4 = this.generic_1;
            if (!(tmp_4 == null) ? isByteArray(tmp_4) : false) {
              tmp_0 = contentHashCode_4(this.generic_1);
            } else {
              var tmp_5 = this.generic_1;
              if (!(tmp_5 == null) ? isShortArray(tmp_5) : false) {
                tmp_0 = contentHashCode_3(this.generic_1);
              } else {
                var tmp_6 = this.generic_1;
                if (!(tmp_6 == null) ? isIntArray(tmp_6) : false) {
                  tmp_0 = contentHashCode_2(this.generic_1);
                } else {
                  var tmp_7 = this.generic_1;
                  if (!(tmp_7 == null) ? isFloatArray(tmp_7) : false) {
                    tmp_0 = contentHashCode_1(this.generic_1);
                  } else {
                    var tmp_8 = this.generic_1;
                    if (!(tmp_8 == null) ? isLongArray(tmp_8) : false) {
                      tmp_0 = contentHashCode_0(this.generic_1);
                    } else {
                      var tmp_9 = this.generic_1;
                      if (!(tmp_9 == null) ? isDoubleArray(tmp_9) : false) {
                        tmp_0 = contentHashCode(this.generic_1);
                      } else {
                        tmp_0 = this.generic_1 == null ? 0 : hashCode(this.generic_1);
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
      tmp = tmp_0;
    }
    return tmp;
  };

Every reference to this.generic_1 gets its own temporary local. Instead, all references should share a local resulting in only a single call to this.generic_1. This would also allow the control flow to change from nested ifs to a series of if/else ifs. Should result in better performance and a JS source size reduction.

This is also evident in the Java bytecode:

  public int hashCode();
    descriptor: ()I
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=1, locals=2, args_size=1
         0: aload_0
         1: getfield      #15                 // Field generic:Ljava/lang/Object;
         4: ifnonnull     11
         7: iconst_0
         8: goto          238
        11: aload_0
        12: getfield      #15                 // Field generic:Ljava/lang/Object;
        15: instanceof    #26                 // class "[Ljava/lang/Object;"
        18: ifeq          36
        21: aload_0
        22: getfield      #15                 // Field generic:Ljava/lang/Object;
        25: checkcast     #26                 // class "[Ljava/lang/Object;"
        28: astore_1
        29: aload_1
        30: invokestatic  #87                 // Method kotlin/collections/ArraysKt.contentDeepHashCode:([Ljava/lang/Object;)I
        33: goto          238
        36: aload_0
        37: getfield      #15                 // Field generic:Ljava/lang/Object;
        40: instanceof    #34                 // class "[Z"
        43: ifeq          59
        46: aload_0
        47: getfield      #15                 // Field generic:Ljava/lang/Object;
        50: checkcast     #34                 // class "[Z"
        53: invokestatic  #90                 // Method java/util/Arrays.hashCode:([Z)I
        56: goto          238
        59: aload_0
        60: getfield      #15                 // Field generic:Ljava/lang/Object;
        63: instanceof    #41                 // class "[C"
        66: ifeq          82
        69: aload_0
        70: getfield      #15                 // Field generic:Ljava/lang/Object;
        73: checkcast     #41                 // class "[C"
        76: invokestatic  #93                 // Method java/util/Arrays.hashCode:([C)I
        79: goto          238

There are way too many getfield instructions. There should only be a single getfield which is stored in a register and reused. Again, better performance and a bytecode size reduction.

I assume the same is true for native...

And this also applies to equals and toString which seem to reuse the same mechanism by which it selects the appropriate branch to do an operation.

From content-based array toString() in Java bytecode:

       221: aload_0
       222: getfield      #15                 // Field generic:Ljava/lang/Object;
       225: checkcast     #66                 // class "[J"
       228: invokestatic  #156                // Method java/util/Arrays.toString:([J)Ljava/lang/String;
       231: dup
       232: ldc           #134                // String toString(this)
       234: invokestatic  #138                // Method kotlin/jvm/internal/Intrinsics.checkNotNullExpressionValue:(Ljava/lang/Object;Ljava/lang/String;)V
       237: goto          276

We are calling a JDK function (good!) but Kotlin is mistrusting its return value and adding a non-null check (bad!). Worse, this non-null check comes with a string representation of the expression which is just wasted bytes.

We should somehow convince the IR to trust this method call's platform return type as a non-null return type. This would eliminate the method call and eliminate the useless string.

It should be possible given the presence of functions like unsafeCast in JS which arbitrarily allow you to pretend one type is another with no IR overhead.