corretto/corretto-11

Performance issues with version 11.0.19

Mateusz-Krzyszpien opened this issue · 8 comments

Thank you for taking the time to help improve OpenJDK and Corretto 11.

If your request concerns a security vulnerability then please report it by email to aws-security@amazon.com instead of here.
(You can find more information regarding security issues at https://aws.amazon.com/security/vulnerability-reporting/.)

Otherwise, if your issue concerns OpenJDK 11 and is not specific to Corretto 11 we ask that you raise it to the OpenJDK community. Depending on your contributor status for OpenJDK, please use the JDK bug system or
the appropriate mailing list for the given problem area or update project.

If your issue is specific to Corretto 11, then you are in the right place. Please proceed with the following.

Describe the bug

After upgrade from version 11.0.13 to 11.0.19 we could observer that our production Wildfly servers started to utilize CPU up to 100%. Such high utilization made application servers unresponsive. This of course made application unavailable.

After rollback from 11.0.19 to 11.0.13 we could observe that CPU utilization went back to normal 20 - 30 %.

To Reproduce

Unfortunately we were not able to reproduce this issue at lower environments.

Expected behavior

I would like you to check if there are any recent changes to Java code that could create performance issues.

Platform information

OS: Windows Server 2022 Datacenter
Version: java -version

openjdk version "11.0.19" 2023-04-18 LTS
OpenJDK Runtime Environment Corretto-11.0.19.7.1 (build 11.0.19+7-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.19.7.1 (build 11.0.19+7-LTS, mixed mode)

I would like you to check if there are any recent changes to Java code that could create performance issues.

There is 1.5 years between release of 11.0.13 and 11.0.19. I'm not aware of what might cause such significant increase in CPU utilization.


I can suggest some options which can help us narrow down the issue

  1. Check top -H to see which threads have high usage.
  2. Upgrade to an older Corretto-11. First try upgrade 11.0.13 -> 11.0.16 and monitor for increased CPU utilization. If we can find the exact Corretto-11 version that is causing the increase CPU utilization, it will be easier to find a patch that is causing issues.
  3. Use async profiler and look for differences in data between 11.0.13 and 11.0.19.

@Mateusz-Krzyszpien I bet you run on a container?

11.0.17 added this change

CPU Shares Ignored When Computing Active Processor Count (JDK-8281181) Previous JDK releases used an incorrect interpretation of the Linux cgroups parameter "cpu.shares". This might cause the JVM to use fewer CPUs than available, leading to an under utilization of CPU resources when the JVM is used inside a container.
Starting from this JDK release, by default, the JVM no longer considers "cpu.shares" when deciding the number of threads to be used by the various thread pools. The -XX:+UseContainerCpuShares command-line option can be used to revert to the previous behavior. This option is deprecated and may be removed in a future JDK relea

Since there's been no response here, assuming you were able to resolve your regression. Please feel free to re-open or cut a new issue if you need any additional assistance.

@benty-amzn
We see the same issue and ended up reverting to 11.0.16. We can try using -XX:+UseContainerCpuShares but it doesnt feel right to depend on a deprecated feature to tune this. Can you please re-open this?

Sure, happy to reopen.

  • Are you seeing the same behavior where CPU usage grows to 100% on the newer version?
  • Have you confirmed that 11.0.16 is the latest version which does not trigger the behavior?
  • Can you confirm whether enabling the -XX:+UseContainerCpuShares restores the expected behavior on newer versions?
  • Another choice is of course to use the xx availableprocessorcount (or whatever it is), and pre-choose. This bypasses the auto select.
  • The tickets from oracle go into detail why they chose to do this, though I found it indeed caused issues.

We will run these tests and get back to you