scala/bug

2.12.9 in phase jvm takes ~200 secs, up from ~1.4 sec

yoohaemin opened this issue ยท 22 comments

We have a sbt build with 10 subprojects, and one of them started taking a lot more time to compile after bumping to 2.12.9. We do not have a small reproducible build, but only a thread dump of scalac and output from -Ystatistics at this point (of 2.12.9 and 2.12.8 for comparison). You can find them here.

In the file, you can notice that phase jvm took 1411.476ms (3.1%) with 2.12.8, but 197014.957ms (83.9%) with 2.12.9. Using a profiler shows that scalac is stuck in scala.tools.asm.tree.analysis.Frame.merge during the whole time.

Tried on GraalVM EE (macos) and openjdk11 with jvmci on (nixos), with consistent results.

@lrytz any idea what the next investigation steps here ought to be?

lrytz commented

@yoohaemin thanks for the report. Did GC show up in the profiler / could you give it a try with a larger -Xmx?

lrytz commented

Looking at the git history of the backend, I don't see an obvious candidate that could be the cause of the regression. Maybe scala/scala#7780.

I just ran with -Xmx4g, but heap size doesn't seem to go over 1.5G at all, and GC kicks in during compilation multiple times but it is really short and quick.

@yoohaemin have you tried to attach async-profiler and plot flame graphs? possible it will help to spot some new details...

lrytz commented

OK, thanks. Do you have the optimizer enabled (with -opt)? If not, the only optimization that runs is dead code elimination. You could disable that with -opt:l:none.

I just realized that DCE doesn't bail out on large methods (https://github.com/scala/scala/blob/v2.12.8/src/compiler/scala/tools/nsc/backend/jvm/opt/LocalOpt.scala#L294), it should check sizeOKForBasicValue. But that was the same in 2.12.8, so the regression must be caused by something else.

I'm not sure how to diagnose this from the outside. I could most likely identify the issue quickly if I had access to the soruce code. @yoohaemin would it be possible for you to share it (privately)?

(I'm working with @yoohaemin)

@lrytz We didn't have the optimizer enabled. I just tried with -opt:l:none and it's better but still far from 2.12.8 numbers:

  • 2.12.8 jvm phase: 1594.854ms
  • 2.12.8 jvm phase with -opt:l:none: 1606.082ms
  • 2.12.9 jvm phase: 162922.174ms
  • 2.12.9 jvm phase with -opt:l:none: 35246.268ms

Unfortunately we can't share our code, and this project is pretty big so it's not easy to isolate the problem into a smaller build.

We haven't tried the async profiler yet, just VisualVM sampling that shows it is stuck on scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50):

 at scala.tools.asm.tree.analysis.BasicInterpreter.merge(BasicInterpreter.java:50)
        at scala.tools.asm.tree.analysis.Frame.merge(Frame.java:682)
        at scala.tools.asm.tree.analysis.Analyzer.merge(Analyzer.java:541)
        at scala.tools.asm.tree.analysis.Analyzer.analyze(Analyzer.java:194)
        at scala.tools.nsc.backend.jvm.analysis.BackendUtils$AsmAnalyzer.<init>(BackendUtils.scala:99)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removeUnreachableCodeImpl(LocalOpt.scala:509)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.removalRound$2(LocalOpt.scala:294)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:371)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1(LocalOpt.scala:209)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.$anonfun$methodOptimizations$1$adapted(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt$$Lambda$16801/324387393.apply(Unknown Source)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce$$Lambda$13966/448664299.apply(Unknown Source)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)
        at scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)
        at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)
        at scala.tools.nsc.backend.jvm.opt.LocalOpt.methodOptimizations(LocalOpt.scala:208)
        at scala.tools.nsc.backend.jvm.PostProcessor.$anonfun$localOptimizations$1(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.localOptimizations(PostProcessor.scala:130)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$3(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$2(GeneratedClassHandler.scala:123)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16088/261666137.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.tools.nsc.backend.jvm.PostProcessorFrontendAccess$PostProcessorFrontendAccessImpl.withThreadLocalReporter(PostProcessorFrontendAccess.scala:225)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.$anonfun$postProcessUnit$1(GeneratedClassHandler.scala:122)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16084/1543680797.apply$mcV$sp(Unknown Source)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler$$Lambda$16085/620050906.apply(Unknown Source)
        at scala.util.Success.$anonfun$map$1(Try.scala:255)
        at scala.util.Success.map(Try.scala:213)
        at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
        at scala.concurrent.Future$$Lambda$16086/289061864.apply(Unknown Source)
        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
        at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
        at scala.concurrent.impl.Promise$$Lambda$16087/1619313601.apply(Unknown Source)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$SyncWritingClassHandler$$anonfun$$lessinit$greater$1.execute(GeneratedClassHandler.scala:177)
        at scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:24)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete(Promise.scala:372)
        at scala.concurrent.impl.Promise$KeptPromise$Kept.onComplete$(Promise.scala:371)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.onComplete(Promise.scala:379)
        at scala.concurrent.impl.Promise.transform(Promise.scala:33)
        at scala.concurrent.impl.Promise.transform$(Promise.scala:31)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.transform(Promise.scala:379)
        at scala.concurrent.Future.map(Future.scala:292)
        at scala.concurrent.Future.map$(Future.scala:292)
        at scala.concurrent.impl.Promise$KeptPromise$Successful.map(Promise.scala:379)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.postProcessUnit(GeneratedClassHandler.scala:118)
        at scala.tools.nsc.backend.jvm.GeneratedClassHandler$WritingClassHandler.process(GeneratedClassHandler.scala:111)
        at scala.tools.nsc.backend.jvm.CodeGen.genUnit(CodeGen.scala:73)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.apply(GenBCode.scala:74)
        at scala.tools.nsc.Global$GlobalPhase.applyPhase(Global.scala:453)
        at scala.tools.nsc.Global$GlobalPhase.run(Global.scala:399)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.super$run(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.$anonfun$run$1(GenBCode.scala:80)
        at scala.tools.nsc.backend.jvm.GenBCode$BCodePhase.run(GenBCode.scala:78)
        at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1503)
        at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1487)
        at scala.tools.nsc.Global$Run.compileSources(Global.scala:1480)
        at scala.tools.nsc.Global$Run.compile(Global.scala:1606)

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack:

at scala.tools.asm.Frame.merge(Frame.java:1175)
        at scala.tools.asm.MethodWriter.computeAllFrames(MethodWriter.java:1617)
        at scala.tools.asm.MethodWriter.visitMaxs(MethodWriter.java:1553)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:810)
        at scala.tools.asm.tree.MethodNode.accept(MethodNode.java:690)
        at scala.tools.asm.tree.ClassNode.accept(ClassNode.java:426)
        at scala.tools.nsc.backend.jvm.PostProcessor.serializeClass(PostProcessor.scala:140)
        at scala.tools.nsc.backend.jvm.PostProcessor.sendToDisk(PostProcessor.scala:72)
lrytz commented

With -opt:l:none it's still spending time in scala.tools.asm.Frame.merge but with a different call stack.

That's interesting. It really could be due to the ASM upgrade.

I suspect that there is a specific bytecode shape that trips up the ASM analyzer.

You could add https://scala-ci.typesafe.com/artifactory/scala-integration as a resolver and test with Scala versions 2.12.9-bin-b63ccf2 and 2.12.9-bin-ce9a857. That's before and after scala/scala#7780.

You also could try to:

  • git clone git@github.com:scala/scala.git
  • git checkout v2.12.9
  • sbt intellij
  • open src/intellij/scala.ipr in IntelliJ
  • run your sbt build with sbt -J-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
  • Connect the IntelliJ debugger
  • Try to suspend at a breakpoint once the compiler is stuck in Frame.merge
  • Go up the stack to a place where the MethodNode is in scope
  • Use "evaluate expression" and call AsmUtils.traceMethod(methodNode). This would print the ASM representation of the bytecode to the console

Interesting, it is fast with both snapshots. We'll try to find the commit that introduces the problem.

lrytz commented

Here's a git bisect script that you could adapt and use: https://gist.github.com/lrytz/f92e5db08c6bad14aaf0b890ec5d3b4d

You'd have to measure the time somehow since the build doesn't fail, just takes a long time.

It seems to be caused by scala/scala#8061
2.12.9-bin-d386f96 is fast, 2.12.9-bin-197ff13 is slow.

I brought a minimal reproduction after we investigated this issue following advice from @lrytz. Thanks everyone for the help.

https://github.com/yoohaemin/scala-bug-11667-reproduction

believed fixed by scala/scala#8327

@yoohaemin @ghostdogpr would appreciate if you could test 2.12.10-bin-de413b9 and verify

I tried it with https://github.com/yoohaemin/scala-bug-11667-reproduction just now and this looks good:

[info]   jvm                         : 1 spans, ()2205.651ms (8.2%)

@SethTisue we will be able to test on our project on Monday. But based on the repro project it looks good!

@ghostdogpr it looks that in runtime your http-handler burns a lot of CPU cycles in the routing match expression too... will you consider to use some kind of prefix tree which supports wildcard for path elements instead?

Indeed... the generated code is likely too large to be JIT compiled by the JVM, so you have the overhead of interpretation on top of any issues with cascading if/else to get down the matching handler.

Splitting the routes was indeed on our todo list (for both compile-time and runtime performance), but was a low priority since performance was not exactly our biggest concern at the moment. In any case, that big pattern match did help spot a compiler regression ;)

@SethTisue 2.12.10-bin-de413b9 works well ๐Ÿ‘

cool, thanks for confirming