microsoft/thrifty

Very large enums can generate Kotlin code that fails to compile

shashachu opened this issue · 4 comments

When you have thrift enums with lots of elements (say in the 3k-4k range), the Kotlin compiler will fail with an internal error:

Caused by: org.jetbrains.org.objectweb.asm.MethodTooLargeException: Method too large: com/foo/bar/BigEnum.<clinit> ()V
        at org.jetbrains.org.objectweb.asm.MethodWriter.computeMethodInfoSize(MethodWriter.java:2089)
        at org.jetbrains.org.objectweb.asm.ClassWriter.toByteArray(ClassWriter.java:458)
        at org.jetbrains.kotlin.codegen.ClassBuilderFactories$2.asBytes(ClassBuilderFactories.java:118)
        at org.jetbrains.kotlin.codegen.DelegatingClassBuilderFactory.asBytes(DelegatingClassBuilderFactory.kt:36)
        at org.jetbrains.kotlin.codegen.DelegatingClassBuilderFactory.asBytes(DelegatingClassBuilderFactory.kt:36)
        at org.jetbrains.kotlin.codegen.DelegatingClassBuilderFactory.asBytes(DelegatingClassBuilderFactory.kt:36)
        at org.jetbrains.kotlin.codegen.ClassFileFactory$ClassBuilderAndSourceFileList.asBytes(ClassFileFactory.java:348)
        at org.jetbrains.kotlin.codegen.ClassFileFactory$OutputClassFile.asByteArray(ClassFileFactory.java:313)

We found that removing the Int constructor param shrinks the clinit function sufficiently for the compile to succeed, i.e.

enum class BigEnum(val value: Int) {
    ONE(1),
    TWO(2),
    THREE(3),
    ...
}

will fail to compile, but

enum class BigEnum {
    ONE,
    TWO,
    THREE,
    ...
}

will succeed.

In order to make this work, we would also need to generate a fun findByValue(value: BigEnum): Int? function for serialization, and you'd lose the ability to easily convert to an Int.

Thoughts? Perhaps this could be a CLI option.

Wow! Can't say that I've ever butted up against method body size limits before in this project. I'm not surprised to hear that a 4k-member enum would do it. Can you elaborate on how your enum could get so large?

In any event, something you might try is to do a separate codegen, and make this a Java-language enum instead of Kotlin. I've seen claims that javac will be smarter about enum initialization, so maybe this is a kotlinc limitation that you can sidestep here. No guarantees from me though, this is just conjecture. If that works, it may be worthwhile to file a Kotlin bug.

If that doesn't work... it's not totally clear how this proposed solution would work wrt serialization; the wire format assumes equivalence with ints, as you note, and we'd need to preserve the mapping some other way. I can't say that I have high hopes for this feature.

The reason I say that is that this fix would only delay the inevitable, and probably not for very long. It's entirely possible that the proposed findByValue(value: BigEnum) by itself would already hit the same method-length limitation - and if not now, then as your enum grows it probably will soon enough. You definitely know your own situation better than I do, but I just can't see how a 4k+ member enum is sustainable in a JVM platform.

Hmm, maybe I was a bit too pessimistic. It seems likely that we could make some tradeoffs here using companion-object and extension methods:

enum class BigEnum {
  ONE,
  TWO,
  THREE;

  companion object {
    @JvmStatic fun findByValue(value: Int): BigEnum? = when(value) {
      1 -> ONE
      2 -> TWO
      3 -> THREE
     else -> null
    }
  }
}

val BigEnum.value: Int
  get() = when (this) {
      BigEnum.ONE -> 1
      BigEnum.TWO -> 2
      BigEnum.THREE -> 3
  }

It seems pretty viable, but probably has implications that haven't occurred to me yet. We'd certainly end up with larger code sizes. Overall I still think that generating Java for this class would be your best bet.

Wow! Can't say that I've ever butted up against method body size limits before in this project. I'm not surprised to hear that a 4k-member enum would do it. Can you elaborate on how your enum could get so large?

In any event, something you might try is to do a separate codegen, and make this a Java-language enum instead of Kotlin. I've seen claims that javac will be smarter about enum initialization, so maybe this is a kotlinc limitation that you can sidestep here. No guarantees from me though, this is just conjecture. If that works, it may be worthwhile to file a Kotlin bug.

If that doesn't work... it's not totally clear how this proposed solution would work wrt serialization; the wire format assumes equivalence with ints, as you note, and we'd need to preserve the mapping some other way. I can't say that I have high hopes for this feature.

The reason I say that is that this fix would only delay the inevitable, and probably not for very long. It's entirely possible that the proposed findByValue(value: BigEnum) by itself would already hit the same method-length limitation - and if not now, then as your enum grows it probably will soon enough. You definitely know your own situation better than I do, but I just can't see how a 4k+ member enum is sustainable in a JVM platform.

The real solution is certainly 'don't make enums that large'; I won't go into the various reasons why it's not feasible to go that route right now, but suffice it to say we are paying for some decisions made by our past selves. 😅

We've actually found that straight Java runs into this issue sooner than Kotlin; I don't recall the specifics of how exactly it fails, but it's a similar code size issue. When our Java consumers were failing, the Android app was still building due to the enums being generated in Kotlin.

I do think it's feasible with a similar pattern to your second comment; I totally understand if this isn't something you'd want in the mainline, but thought I'd bring it up anyway.

Fixed in #421, will ship with the 3.0.0 release