riemann/riemann

Receiving exception on reload: java.lang.IllegalStateException: Could not find a suitable classloader to modify from clojure.lang.LazySeq@aa452d23

Closed this issue · 8 comments

198d commented

Recently upgraded to v0.3.3 and receive the following exception when trying to reload our config. We are using pomegranate/add-dependencies in the config. Riemann starts fine, but pukes on the add-dependencies call on reload. Have seen related issues #914 and #928, but don't see a clear solution. Attempted to use config/depends instead of add-dependencies directly which got rid of the errors on reload, but then I couldn't start Riemann. Appears config/depends is meant for "plugins" and complained about not finding some metadata file, if I recall.

INFO [2019-07-19 17:42:30,104] SIGHUP handler - riemann.bin - Caught SIGHUP, reloading
INFO [2019-07-19 17:42:30,107] SIGHUP handler - riemann.pubsub - Sweeping transient subscriptions.
ERROR [2019-07-19 17:42:30,264] SIGHUP handler - riemann.bin - Couldn't reload:
clojure.lang.Compiler$CompilerException: java.lang.IllegalStateException: Could not find a suitable classloader to modify from clojure.lang.LazySeq@aa452d23, compiling:(/etc/riemann/riemann.config:37:1)
        at clojure.lang.Compiler.load(Compiler.java:7526)
        at clojure.lang.Compiler.loadFile(Compiler.java:7452)
        at clojure.lang.RT$3.invoke(RT.java:325)
        at riemann.config$include.invokeStatic(config.clj:450)
        at riemann.config$include.invoke(config.clj:427)
        at riemann.bin$reload_BANG_.invokeStatic(bin.clj:46)
        at riemann.bin$reload_BANG_.invoke(bin.clj:36)
        at riemann.bin$handle_signals$fn__14141.invoke(bin.clj:62)
        at riemann.bin.proxy$java.lang.Object$SignalHandler$d8c00ec7.handle(Unknown Source)
        at sun.misc.Signal$1.run(Signal.java:212)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Could not find a suitable classloader to modify from clojure.lang.LazySeq@aa452d23
        at cemerick.pomegranate$add_classpath.invokeStatic(pomegranate.clj:54)
        at cemerick.pomegranate$add_classpath.invoke(pomegranate.clj:42)
        at clojure.lang.AFn.applyToHelper(AFn.java:154)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.core$apply.invokeStatic(core.clj:659)
        at clojure.core$apply.invoke(core.clj:652)
        at cemerick.pomegranate$add_dependencies.invokeStatic(pomegranate.clj:85)
        at cemerick.pomegranate$add_dependencies.doInvoke(pomegranate.clj:57)
        at clojure.lang.RestFn.invoke(RestFn.java:512)
        at riemann.config$eval198.invokeStatic(riemann.config:37)
        at riemann.config$eval198.invoke(riemann.config:37)
        at clojure.lang.Compiler.eval(Compiler.java:7062)
        at clojure.lang.Compiler.load(Compiler.java:7514)
        ... 10 common frames omitted

Can you share your config?

Also - ping @pyr as this is more in his wheelhouse.

198d commented
; -*- mode: clojure; -*-
; vim: filetype=clojure

; Log output to a file.
(logging/init {:file "/var/log/riemann/riemann.log"})
(logging/set-level "riemann.transport.tcp" ch.qos.logback.classic.Level/ERROR)
(logging/set-level "riemann.transport.udp" ch.qos.logback.classic.Level/ERROR)
;(logging/set-level "org.apache.http.wire" ch.qos.logback.classic.Level/TRACE)

; Accept metrics via UDP.
(udp-server :host "0.0.0.0"
            :port 5555
            :max-size 65535)

; Accept metrics via TCP.
(tcp-server :host "0.0.0.0"
            :port 5555)

; Set up a TLS-protected port for secure metrics. The key can be generated with
; the following command:
;     openssl pkcs8 -topk8 -nocrypt -in $tls_key -out $pkcs8_key
#_ ; TODO: Not enabled yet
(tcp-server :host "0.0.0.0"
            :port 5554
            :tls? true
            :key  "..."
            :cert "..."
            :ca-cert "...")

; Serve websocket for riemann-dash subscriptions.
(ws-server :host "0.0.0.0"
           :port 5556)

; Sweep the index for expired events every 10 seconds.
(periodically-expire 10 {:keep-keys [:host :service :sys :stack :tags]})

(add-dependencies
  :coordinates '[[amperity/vault-clj "0.7.0"]
                 [amperity/vault-clj-aws "0.0.1"]]
  :repositories  {"clojars" "https://clojars.org/repo"}
  :exit-on-failure true)

(require
  '[amperity.riemann.core :as amp]
  :reload-all)

; Pass events into Amperity rules.
(let [record-index (default {:ttl 300} (index))]
  (streams (amp/rules record-index)))

The working theory is around riemann.bin/ensure-dynamic-classloader setting up the class loader on the current thread at startup, however, the call to riemann.bin/reload! happens in a separate thread when handling the HUP signal. The assumption is that the dynamic classloader isn't available in that thread. Out of curiosity, I threw a call to riemann.bin/ensure-dynamic-classloader into the config before the add-dependencies call and Riemann starts with that config and also survives subsequent reloads under Java 8. Under Java 9 and 10, Riemann starts and survives the first reload, but throws a pile of weird exceptions on subsequent reloads:

INFO [2019-07-20 19:55:07,142] SIGHUP handler - riemann.bin - Caught SIGHUP, reloading
INFO [2019-07-20 19:55:07,160] SIGHUP handler - riemann.pubsub - Sweeping transient subscriptions.
INFO [2019-07-20 19:55:07,369] SIGHUP handler - riemann.core - Hyperspace core online
INFO [2019-07-20 19:55:09,517] SIGHUP handler - riemann.bin - Caught SIGHUP, reloading
INFO [2019-07-20 19:55:09,518] SIGHUP handler - riemann.pubsub - Sweeping transient subscriptions.
ERROR [2019-07-20 19:55:09,523] SIGHUP handler - riemann.bin - Couldn't reload:
clojure.lang.Compiler$CompilerException: java.lang.NoClassDefFoundError: clojure/lang/AFunction, compiling:(/app/etc/riemann.config:3:1)                                                                                                      
        at clojure.lang.Compiler.analyzeSeq(Compiler.java:7010)
        at clojure.lang.Compiler.analyze(Compiler.java:6773)
        at clojure.lang.Compiler.eval(Compiler.java:7059)
        at clojure.lang.Compiler.load(Compiler.java:7514)
        at clojure.lang.Compiler.loadFile(Compiler.java:7452)
        at clojure.lang.RT$3.invoke(RT.java:325)
        at riemann.config$include.invokeStatic(config.clj:450)
        at riemann.config$include.invoke(config.clj:427)
        at riemann.bin$reload_BANG_.invokeStatic(bin.clj:46)
        at riemann.bin$reload_BANG_.invoke(bin.clj:36)
        at riemann.bin$handle_signals$fn__14141.invoke(bin.clj:62)
        at riemann.bin.proxy$java.lang.Object$SignalHandler$d8c00ec7.handle(Unknown Source)
        at jdk.unsupported/sun.misc.Signal$InternalMiscHandler.handle(Signal.java:198)
        at java.base/jdk.internal.misc.Signal$1.run(Signal.java:220)
        at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.NoClassDefFoundError: clojure/lang/AFunction
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1007)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:868)
        at clojure.lang.DynamicClassLoader.defineClass(DynamicClassLoader.java:46)
        at clojure.lang.Compiler$ObjExpr.getCompiledClass(Compiler.java:4979)
        at clojure.lang.Compiler$FnExpr.parse(Compiler.java:4109)
        at clojure.lang.Compiler.analyzeSeq(Compiler.java:7001)
        ... 14 common frames omitted
Caused by: java.lang.ClassNotFoundException: clojure.lang.AFunction
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
        at clojure.lang.DynamicClassLoader.findClass(DynamicClassLoader.java:69)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
        at clojure.lang.DynamicClassLoader.loadClass(DynamicClassLoader.java:77)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
        ... 21 common frames omitted
Exception in thread "Thread-6" java.lang.ExceptionInInitializerError
        at riemann.streams$default$stream__8817.invoke(streams.clj:1411)
        at riemann.core$stream_BANG_$fn__9755.invoke(core.clj:20)
        at riemann.core$stream_BANG_.invokeStatic(core.clj:19)
        at riemann.core$stream_BANG_.invoke(core.clj:15)
        at riemann.core$instrumentation_service$measure__9764.invoke(core.clj:59)
        at riemann.service.ThreadService$thread_service_runner__6512$fn__6513.invoke(service.clj:101)
        at riemann.service.ThreadService$thread_service_runner__6512.invoke(service.clj:100)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.ClassNotFoundException: clojure/lang/PersistentArrayMap
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:375)
        at clojure.lang.RT.classForName(RT.java:2204)
        at clojure.lang.RT.classForName(RT.java:2213)
        at clojure.lang.Reflector.invokeStaticMethod(Reflector.java:199)
        at clojure.lang.LispReader$EvalReader.invoke(LispReader.java:1308)
        at clojure.lang.LispReader$DispatchReader.invoke(LispReader.java:843)
        at clojure.lang.LispReader.read(LispReader.java:275)
        at clojure.lang.LispReader.read(LispReader.java:210)
        at clojure.lang.LispReader.access$200(LispReader.java:41)
        at clojure.lang.LispReader$MetaReader.invoke(LispReader.java:944)
        at clojure.lang.LispReader.read(LispReader.java:275)
        at clojure.lang.LispReader.readDelimitedList(LispReader.java:1384)
        at clojure.lang.LispReader$ListReader.invoke(LispReader.java:1232)
        at clojure.lang.LispReader.read(LispReader.java:275)
        at clojure.lang.LispReader.read(LispReader.java:210)
        at clojure.lang.LispReader.access$200(LispReader.java:41)
        at clojure.lang.LispReader$EvalReader.invoke(LispReader.java:1287)
        at clojure.lang.LispReader$DispatchReader.invoke(LispReader.java:843)
        at clojure.lang.LispReader.read(LispReader.java:275)
        at clojure.lang.LispReader.read(LispReader.java:206)
        at clojure.lang.LispReader.read(LispReader.java:195)
        at clojure.lang.RT.readString(RT.java:1871)
        at clojure.lang.RT.readString(RT.java:1866)
        at riemann.streams$default$stream__8817$fn__8838.<clinit>(streams.clj:1411)
        ... 9 more

Let me know if you need any further information. The above was info was gleaned from a similar config running Riemann v0.3.3 in a Docker container using the openjdk:{8,9,10} images from Docker hub.

198d commented

Having been distracted by other work, I have come back to this. Some more details:

While I've mostly worked around the issues with reloads by making calls to riemann.bin/ensure-dynamic-classloader in the config itself, the main issue I have now is that add-dependencies doesn't seem to be modifying the classpath/classloader in a way that allows me to require namespaces that should be available after retrieving the declared dependencies. Will be digging into pomegranate and potentially dynapath to understand all that is going here.

198d commented

Hmmm...inspecting the URLs associated with the class loader before and after a call to add-dependencies does show the expected results.

198d commented

So, I think I've largely tracked down where things are diverging. I believe this is likely something that only happens with the AOT compiled uberjar. Here's a rough rundown of the issue:

  1. JVM starts, pomegranate/classloader-hierarchy shows App and System/Platform class loaders. Passing (clojure.lang.RT/baseLoader) or @clojure.lang.Compiler/LOADER to pomegranate/classloader-hierarchy shows the same hierarchy.
  2. riemann.bin/ensure-dynamic-classloader injects a clojure.lang.DynamicClassLoader at the base of this hierarchy, however, the hierarchy remains unchanged when inspecting (clojure.lang.RT/baseLoader) and @clojure.lang.Compiler/LOADER.
  3. riemann.bin/set-config-file! calls pomegranate/add-classpath to inject the base dir of the given config file into the classpath, modifying the class loader created by riemann.bin/ensure-dynamic-classloader.
  4. riemann.conf/apply-config! loads the Riemann config.
    a. clojure.core/load is called to evaluate the config. The class hierarchies seen by inspecting (clojure.lang.RT/baseLoader) or @clojure.lang.Compiler/LOADER now include 2 layers of clojure.lang.DynamicClassLoader objects. Subsequent calls to inspect those hierarchies returns a different clojure.lang.DynamicClassloader at the base of the hierarchy, but a consistent parent to that loader. Neither of these clojure.lang.DynamicClassLoader instances are the same as the one created by riemann.bin/ensure-dynamic-classloader.
    b. In our config, we eventually call pomegranate/add-dependencies which succeeds and modifies the class loader configured by riemann.bin/ensure-dynamic-classloader because it is attached to the current thread and that is Pomegranate's default behavior.
    c. Calls to require namespaces that live in the config directory fail and calls to require namespaces fetched with pomegranate/add-dependencies fail because neither of those paths are on the class path that the clojure runtime/compiler see.

Phew...I hope that makes sense. I have confirmed this behavior under OpenJDK 8, 9 and 10, but I'm still a little hazy about why everything described in 4a happens. Will try and put together a patch in the next day or two that hopefully resolves all of this for good.

Thanks for this @198d!

Addressed in 0.3.4