puniverse/pulsar

supervisor does not appear to handle gen-server crashing

sandhu opened this issue · 9 comments

I'm trying to get a handle on the supervisor in pulsar and running into an issue when using it to manage gen-servers.

The supervisor does not appear to do anything if the gen-server throws an exception in the init.

The gen-server code is as follows:

(defn test-gen-server
  [name]
  (gen-server
   (reify Server
     (init [_]
       (println "Starting Server...")
       ;; Intentionally crash the gen-server
       (throw (Exception. "Blah"))
       (register! name @self)
       (println "Started Server."))
     (terminate [_ cause]
       (println "Stopping Server...")
       (unregister! @self)
       (println "Stopped Server."))
     (handle-call [_ from id [command param]]
       ))))

And I'm launching it as follows:

(defn run-server-via-supervisor
  []
  (spawn
   (supervisor "entry-point" :one-for-one
               (fn []
                 [["test/test-server" :permanent 20 5 :sec 100
                   (test-gen-server "test/test-server")]]))))

Running it produces:

> (run-server-via-supervisor)
#object[co.paralleluniverse.actors.behaviors.Supervisor 0x41f14811 "Supervisor{ActorRef@41f14811{SupervisorActor@entry-point[owner: entry-point]}}"]
Starting Server...
Stopping Server...
Stopped Server.

It does not appear that the supervisor is attempting the 20 restarts as indicated in the spec.

Additionally the output is identical to simply spawning the gen-server directly.

> (spawn (test-gen-server "test/test-server"))
#object[co.paralleluniverse.actors.behaviors.Server 0x129ff808 "Server{ActorRef@129ff808{ServerActor@6becac41[owner: fiber-10000007]}}"]
Starting Server... 
Stopping Server...
Stopped Server.

I've attached a minimum test project for reference — pulsar-test.zip

Please let me know if there is any additional information I can provide to help debug this, or if my understanding of the supervisor is incorrect.

I'll add that things work as expected with an actor

pulsar-test.core> (spawn
                   (supervisor "entry-point" :one-for-one
                               (fn []
                                 [["test/test-server" :permanent 20 5 :sec 100
                                   (fn []
                                     (println "Starting actor")
                                     (throw (Exception. "from actor")))]])))
#object[co.paralleluniverse.actors.behaviors.Supervisor 0x2153fbbc "Supervisor{ActorRef@2153fbbc{SupervisorActor@entry-point[owner: entry-point]}}"]
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor

Similar result when throwing from handle-timeout

(defn test-gen-server
  [name]
  (gen-server :timeout 2000
              (reify Server
                (init [_]
                  (println "Starting Server...")
                  (register! name @self)
                  (println "Started Server."))
                (terminate [_ cause]
                  (println "Stopping Server..." cause)
                  (unregister! @self)
                  (println "Stopped Server."))
                (handle-call [_ from id [command param]]
                  )
                (handle-timeout [_]
                  (println "Throwing from handle-timeout")
                  (throw (Exception. "Blah timeout"))))))
pulsar-test.core> (def s (run-server-via-supervisor)) 
#'pulsar-test.core/s
Starting Server...
Started Server.
Throwing from handle-timeout
Stopping Server... #error {
 :cause Blah timeout
 :via
 [{:type java.lang.Exception
   :message Blah timeout
   :at [pulsar_test.core$test_gen_server$reify__25316 handle_timeout form-init1441529214695591630.clj 27]}]
 :trace
 [[pulsar_test.core$test_gen_server$reify__25316 handle_timeout form-init1441529214695591630.clj 27]
  [co.paralleluniverse.pulsar.actors$Server$reify__25164 handleTimeout actors.clj 665]
  [co.paralleluniverse.actors.behaviors.ServerActor handleTimeout ServerActor.java 360]
  [co.paralleluniverse.actors.behaviors.ServerActor behavior ServerActor.java 199]
  [co.paralleluniverse.actors.behaviors.BehaviorActor doRun BehaviorActor.java 293]
  [co.paralleluniverse.actors.behaviors.BehaviorActor doRun BehaviorActor.java 36]
  [co.paralleluniverse.actors.Actor run0 Actor.java 691]
  [co.paralleluniverse.actors.ActorRunner run ActorRunner.java 51]
  [co.paralleluniverse.fibers.Fiber run Fiber.java 1072]
  [co.paralleluniverse.fibers.Fiber run1 Fiber.java 1067]
  [co.paralleluniverse.fibers.Fiber
 exec Fiber.java 767]
  [co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask exec1 FiberForkJoinScheduler.java 266]
  [co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask doExec ParkableForkJoinTask.java 117]
  [co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask exec ParkableForkJoinTask.java 74]
  [jsr166e.ForkJoinTask doExec ForkJoinTask.java 261]
  [jsr166e.ForkJoinPool$WorkQueue runTask ForkJoinPool.java 988]
  [jsr166e.ForkJoinPool runWorker ForkJoinPool.java 1628]
  [jsr166e.ForkJoinWorkerThread run ForkJoinWorkerThread.java 107]]}
Stopped Server.
pulsar-test.core> 

@pron, @circlespainter — Apologies for tagging you guys directly, but this issue is blocking my application.

Could you please take a look. I may very well be using gen-server and/or supervisor incorrectly, but it'd be helpful to know.

A minimal example that reproduces the issue is attached to the original post.

Thank you.

pron commented

I'll take a look shortly, and respond within the week.

Thank you @pron. Very much appreciated.

pron commented

Found the problem. I'll push a fix tomorrow.

That's great. Thank you.

pron commented

Alright, you'll need to use 0.7.7-SNAPSHOT.
Instead of (test-gen-server "test/test-server") write:

(actor-builder test-gen-server "test/test-server")

or

(actor-builder #(test-gen-server "test/test-server"))

to tell the supervisor that this is a function that (re)creates the actor.

Thanks @pron, it works as expected in my example code. Will try it in the full application later today and let you know if there are any issues.

Is there a timeframe for the 0.7.7 release ?