Do an optimization pass on platform-specific output
Opened this issue · 6 comments
We want to optimize for a sweet spot between performance and code size. This is mostly a placeholder issue until #142 is sufficiently resolved that this becomes a priority and the research is done to see where any problems may lie.
-
toString()
string de-duplication with types/fields - JS double
instanceof
- Pointless
Int.hashCode()
call - Pointless
UInt.hashCode()
call. Same as above, should betoInt()
call which is no-op. - Optimize control flow of
equals()
- Cache property lookup for content-based array ops
- Eliminate
checkNonNull
instrinsic
Here's a quick one.
The toString()
function is generated like this (in JS):
protoOf(Simple_0).toString = function () {
return 'Simple(int=' + this.int_1 + ', requiredString=' + this.requiredString_1 + ', optionalString=' + this.optionalString_1 + ')';
};
Perfectly reasonable for JS.
On the JVM and Android, however, strings are used both for string literals but also for the names of types, methods, and fields. This means that the backing fields for the int
, requiredString
, and optionalString
cause strings with those contents to be present in the classfile. By using a string like ", requiredString="
we are duplicating 100% of the UTF-8 bytes of "requiredString". This is doubly true for "Simple(int="
.
$ javap -v poko-tests/build/classes/kotlin/jvm/main/poko/Simple.class
Classfile /Volumes/dev/other/Poko/poko-tests/build/classes/kotlin/jvm/main/poko/Simple.class
⋮
#9 = Utf8 requiredString
#10 = String #9 // requiredString
⋮
#69 = Utf8 , requiredString=
#70 = String #69 // , requiredString=
Despite being counter-intuitive, it's actually more efficient to generate code which looks like this
return "Simple" + '(' + "int" + '=' + int + ", " + "requiredString" + '=' + ...
This will allow sharing the string bytes with the type and field names, and it means the ", "
string will be de-duplicated as well. We pay for this in a bytecode count increase, but it should still result in an overall reduction of binary size. There's also a very, very tiny decrease in performance (more method calls), but it should be almost impossible to notice without precise JMH benchmarks.
Generating this is tricky, because you have to avoid any steps that join constants together. I have no idea if we can do it in IR, but I would suspect it's possible. It may also only be beneficial to do this for JVM targets. I suspect a good JS minifier will combine the constant strings back together (which is good, size is more important), but I have no idea what the correct thing to do for native targets is.
I previously did this optimization on View Binding's generated code: https://jakewharton.com/the-economics-of-generated-code/#string-duplication. It'd be another selling point of Poko over data classes.
Separately, seeing if StringConcatFactory
can be used on Java 9+ or ObjectMethods
can be used on Java 16+ is also worth a look.
Another one that jumps out in JS:
if (!(other instanceof Complex_0))
return false;
var tmp0_other_with_cast = other instanceof Complex_0 ? other : THROW_CCE();
There's no need to do a double instanceof
. This is the result of IR which does a type check separately from doing a cast. On some platforms like the JVM this is required (instanceof
vs. checkcast
bytecodes). In JS, however, it very clearly is a wasteful operation. Perhaps this is simply a Kotlin bug?
Anyway, I've seen this before in the JS and have never been able to eliminate it when compiling from Kotlin source. We are in IR, however, and can possibly eliminate it.
We appear to be invoking Int.hashCode()
when creating hash codes.
The Java bytecode has this:
0: aload_0
1: getfield #23 // Field int:I
4: invokestatic #50 // Method java/lang/Integer.hashCode:(I)I
Integer.hashCode
is an identity function–it simply returns its input. There's no need to invoke it at all, just use the integer value directly. Save a few bytes!
Equality checking has suboptimal control flow.
The Java bytecode:
21: aload_0
22: getfield #23 // Field int:I
25: aload_2
26: getfield #23 // Field int:I
29: if_icmpeq 34
32: iconst_0
33: ireturn
34: aload_0
35: getfield #26 // Field requiredString:Ljava/lang/String;
38: aload_2
39: getfield #26 // Field requiredString:Ljava/lang/String;
42: invokestatic #42 // Method kotlin/jvm/internal/Intrinsics.areEqual:(Ljava/lang/Object;Ljava/lang/Object;)Z
45: ifne 50
48: iconst_0
49: ireturn
50: aload_0
51: getfield #29 // Field optionalString:Ljava/lang/String;
54: aload_2
55: getfield #29 // Field optionalString:Ljava/lang/String;
58: invokestatic #42 // Method kotlin/jvm/internal/Intrinsics.areEqual:(Ljava/lang/Object;Ljava/lang/Object;)Z
61: ifne 66
64: iconst_0
65: ireturn
66: iconst_1
67: ireturn
I have manually separated these into what I hope is somewhat clear operations. It's successive sets of loading two corresponding values, doing a comparison, and then maybe returning false. Finally, we return true.
The control flow graph looks somewhat like this:
21-26 aload/getfields
29 if_icmpeq 34 ─┐
32 iconst_0 │
◄── 33 ireturn │
┌───────────────────┘
└► 34-39 aloads/getfields
42 invokestatic
45 ifne 50 ─┐
48 iconst_0 │
◄── 49 ireturn │
┌──────────────┘
└► 50-55 aloads/getfields
58 invokestatic
61 ifne 66 ─┐
64 iconst_0 │
◄── 65 ireturn │
┌──────────────┘
└► 66 iconst_1
◄── 67 ireturn
I see two things wrong with this:
- The "return false" branches are duplicated creating duplicate bytecodes.
- There is no real happy path which creates a lot of jumps.
Now, I'm not sure which is the happy path: equals or non-equals, but it doesn't really matter. We should pick one and optimize for it.
I am going to pick the equals path as being the happy path under the assumption that we want to optimize for hash-based collection usage. These will use the hash code to find the bucket and then do an equals to ensure it found a match or a collision requiring another check. An ideal hash map has no hash collisions, so this equals check should always succeed.
I also think equals is worth optimizing for simply because it will produce more compact bytecode, which should look something like this:
21-26 aload/getfields
29 if_icmpne 66 ───────┐
│
32-37 aloads/getfields │
│
40 invokestatic │
43 ifeq 66 ────────────┤
│
45-50 aloads/getfields │
│
53 invokestatic │
56 ifeq 66 ────────────┤
│
58 iconst_1 │
◄── 59 ireturn │
┌─────────────────────────┘
└► 66 iconst_0
◄── 61 ireturn
Now the equals case flows linearly from top to return true
–no jumps! In the case of a mismatch, all cases jump to a single return false
. This should be faster for just about every case (except a first field mismatch which was faster in the old setup).
Content-based array IR uses inefficient property access causing a performance problem.
Consider just hashCode
in JS:
protoOf(GenericArrayHolder).hashCode = function () {
var tmp;
if (this.generic_1 == null) {
tmp = 0;
} else {
var tmp_0;
var tmp_1 = this.generic_1;
if (!(tmp_1 == null) ? isArray(tmp_1) : false) {
tmp_0 = contentDeepHashCode(this.generic_1);
} else {
var tmp_2 = this.generic_1;
if (!(tmp_2 == null) ? isBooleanArray(tmp_2) : false) {
tmp_0 = contentHashCode_6(this.generic_1);
} else {
var tmp_3 = this.generic_1;
if (!(tmp_3 == null) ? isCharArray(tmp_3) : false) {
tmp_0 = contentHashCode_5(this.generic_1);
} else {
var tmp_4 = this.generic_1;
if (!(tmp_4 == null) ? isByteArray(tmp_4) : false) {
tmp_0 = contentHashCode_4(this.generic_1);
} else {
var tmp_5 = this.generic_1;
if (!(tmp_5 == null) ? isShortArray(tmp_5) : false) {
tmp_0 = contentHashCode_3(this.generic_1);
} else {
var tmp_6 = this.generic_1;
if (!(tmp_6 == null) ? isIntArray(tmp_6) : false) {
tmp_0 = contentHashCode_2(this.generic_1);
} else {
var tmp_7 = this.generic_1;
if (!(tmp_7 == null) ? isFloatArray(tmp_7) : false) {
tmp_0 = contentHashCode_1(this.generic_1);
} else {
var tmp_8 = this.generic_1;
if (!(tmp_8 == null) ? isLongArray(tmp_8) : false) {
tmp_0 = contentHashCode_0(this.generic_1);
} else {
var tmp_9 = this.generic_1;
if (!(tmp_9 == null) ? isDoubleArray(tmp_9) : false) {
tmp_0 = contentHashCode(this.generic_1);
} else {
tmp_0 = this.generic_1 == null ? 0 : hashCode(this.generic_1);
}
}
}
}
}
}
}
}
}
tmp = tmp_0;
}
return tmp;
};
Every reference to this.generic_1
gets its own temporary local. Instead, all references should share a local resulting in only a single call to this.generic_1
. This would also allow the control flow to change from nested if
s to a series of if
/else if
s. Should result in better performance and a JS source size reduction.
This is also evident in the Java bytecode:
public int hashCode();
descriptor: ()I
flags: (0x0001) ACC_PUBLIC
Code:
stack=1, locals=2, args_size=1
0: aload_0
1: getfield #15 // Field generic:Ljava/lang/Object;
4: ifnonnull 11
7: iconst_0
8: goto 238
11: aload_0
12: getfield #15 // Field generic:Ljava/lang/Object;
15: instanceof #26 // class "[Ljava/lang/Object;"
18: ifeq 36
21: aload_0
22: getfield #15 // Field generic:Ljava/lang/Object;
25: checkcast #26 // class "[Ljava/lang/Object;"
28: astore_1
29: aload_1
30: invokestatic #87 // Method kotlin/collections/ArraysKt.contentDeepHashCode:([Ljava/lang/Object;)I
33: goto 238
36: aload_0
37: getfield #15 // Field generic:Ljava/lang/Object;
40: instanceof #34 // class "[Z"
43: ifeq 59
46: aload_0
47: getfield #15 // Field generic:Ljava/lang/Object;
50: checkcast #34 // class "[Z"
53: invokestatic #90 // Method java/util/Arrays.hashCode:([Z)I
56: goto 238
59: aload_0
60: getfield #15 // Field generic:Ljava/lang/Object;
63: instanceof #41 // class "[C"
66: ifeq 82
69: aload_0
70: getfield #15 // Field generic:Ljava/lang/Object;
73: checkcast #41 // class "[C"
76: invokestatic #93 // Method java/util/Arrays.hashCode:([C)I
79: goto 238
There are way too many getfield
instructions. There should only be a single getfield
which is stored in a register and reused. Again, better performance and a bytecode size reduction.
I assume the same is true for native...
And this also applies to equals
and toString
which seem to reuse the same mechanism by which it selects the appropriate branch to do an operation.
From content-based array toString()
in Java bytecode:
221: aload_0
222: getfield #15 // Field generic:Ljava/lang/Object;
225: checkcast #66 // class "[J"
228: invokestatic #156 // Method java/util/Arrays.toString:([J)Ljava/lang/String;
231: dup
232: ldc #134 // String toString(this)
234: invokestatic #138 // Method kotlin/jvm/internal/Intrinsics.checkNotNullExpressionValue:(Ljava/lang/Object;Ljava/lang/String;)V
237: goto 276
We are calling a JDK function (good!) but Kotlin is mistrusting its return value and adding a non-null check (bad!). Worse, this non-null check comes with a string representation of the expression which is just wasted bytes.
We should somehow convince the IR to trust this method call's platform return type as a non-null return type. This would eliminate the method call and eliminate the useless string.
It should be possible given the presence of functions like unsafeCast
in JS which arbitrarily allow you to pretend one type is another with no IR overhead.