clj-easy/graal-build-time

Consider using Java agent for detecting transitively initialized classes

Opened this issue · 2 comments

See https://github.com/luontola/clojure-native-image-agent for an approach.

Benefit of the Java agent approach:

  • single segment namespaces would be supported (which is an anti-pattern, but still).
  • top-level instantiated Java classes would be also included in the config. One should always try to avoid this, so perhaps we should emit a warning if this happens, but it adds to the usability.

Downsides:

  • Must run Java agent after AOT-ing Clojure (on uberjar or classpath) which is an extra step compared to what we have now.

Questions:

  • Can we combine both approaches into one and run the Java agent during the native image build by shelling out to a different VM?
  • Is there another way of detecting transitively instantiated classes by e.g. inspecting bytecode?
  • Why doesn't GraalVM support this out of the box?
  • In practice the current basic approach is usually enough, so is it worth the extra complexity? Perhaps we should add an expert mode using a Java property that then invokes the agent.

@luontola Perhaps you're willing to team up?

Context: oracle/graal#3476 (reply in thread)

Some experiences from my Java agent approach:

(1) Some of the classes which it detects being loaded, are not actually initialized. Maybe the Clojure compiler uses the Class.forName​(name, initialize=false, loader) variant of Class.forName which loads the class but doesn't initialize it. For example the agent detected the com.amazonaws.client.builder.AwsClientBuilder class as a dependency, but not that it in turn contains a static field with an com.amazonaws.regions.DefaultAwsRegionProviderChain instance (which in turn loads a couple more AWS classes and org.apache.commons.logging).

The workaround for this would be to either force initialize all classes that were loaded, or to detect classes which are loaded but not initialized, and exclude them from the BTI configuration. The former would be an easy loop to call Class.forName for each class. The latter would likely require a bytecode transformation which inserts a tracing method call to each class's static initializer to notify when the class is initialized. If java.lang.Class would have an "isInitialized" method, that would make things easier, but I couldn't find anything like it. Maybe GraalVM would have access to lower level information that would help here.

(2) The agent also detects all JDK classes which are loaded. Trying to write BTI configuration for them conflicts with GraalVM's configuration, so they must be excluded. It would be useful to have access to the data in com.oracle.svm.hosted.jdk.JDKInitializationFeature and com.oracle.svm.hosted.classinitialization.ClassInitializationFeature#initializeNativeImagePackagesAtBuildTime, to avoid duplicating the information about which packages GraalVM already handles out of the box.

I just tried native-image -J-javaagent:... and it's possible to start a Java agent inside the native-image process. Hopefully we won't need to use that, but it's the ultimate hook for changing how GraalVM does things. The application classes are not in the classpath of the native-image process.