supervisor does not appear to handle gen-server crashing
sandhu opened this issue · 9 comments
I'm trying to get a handle on the supervisor
in pulsar and running into an issue when using it to manage gen-server
s.
The supervisor does not appear to do anything if the gen-server
throws an exception in the init
.
The gen-server
code is as follows:
(defn test-gen-server
[name]
(gen-server
(reify Server
(init [_]
(println "Starting Server...")
;; Intentionally crash the gen-server
(throw (Exception. "Blah"))
(register! name @self)
(println "Started Server."))
(terminate [_ cause]
(println "Stopping Server...")
(unregister! @self)
(println "Stopped Server."))
(handle-call [_ from id [command param]]
))))
And I'm launching it as follows:
(defn run-server-via-supervisor
[]
(spawn
(supervisor "entry-point" :one-for-one
(fn []
[["test/test-server" :permanent 20 5 :sec 100
(test-gen-server "test/test-server")]]))))
Running it produces:
> (run-server-via-supervisor)
#object[co.paralleluniverse.actors.behaviors.Supervisor 0x41f14811 "Supervisor{ActorRef@41f14811{SupervisorActor@entry-point[owner: entry-point]}}"]
Starting Server...
Stopping Server...
Stopped Server.
It does not appear that the supervisor is attempting the 20 restarts as indicated in the spec.
Additionally the output is identical to simply spawning the gen-server
directly.
> (spawn (test-gen-server "test/test-server"))
#object[co.paralleluniverse.actors.behaviors.Server 0x129ff808 "Server{ActorRef@129ff808{ServerActor@6becac41[owner: fiber-10000007]}}"]
Starting Server...
Stopping Server...
Stopped Server.
I've attached a minimum test project for reference — pulsar-test.zip
Please let me know if there is any additional information I can provide to help debug this, or if my understanding of the supervisor
is incorrect.
I'll add that things work as expected with an actor
pulsar-test.core> (spawn
(supervisor "entry-point" :one-for-one
(fn []
[["test/test-server" :permanent 20 5 :sec 100
(fn []
(println "Starting actor")
(throw (Exception. "from actor")))]])))
#object[co.paralleluniverse.actors.behaviors.Supervisor 0x2153fbbc "Supervisor{ActorRef@2153fbbc{SupervisorActor@entry-point[owner: entry-point]}}"]
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Starting actor
Similar result when throwing from handle-timeout
(defn test-gen-server
[name]
(gen-server :timeout 2000
(reify Server
(init [_]
(println "Starting Server...")
(register! name @self)
(println "Started Server."))
(terminate [_ cause]
(println "Stopping Server..." cause)
(unregister! @self)
(println "Stopped Server."))
(handle-call [_ from id [command param]]
)
(handle-timeout [_]
(println "Throwing from handle-timeout")
(throw (Exception. "Blah timeout"))))))
pulsar-test.core> (def s (run-server-via-supervisor))
#'pulsar-test.core/s
Starting Server...
Started Server.
Throwing from handle-timeout
Stopping Server... #error {
:cause Blah timeout
:via
[{:type java.lang.Exception
:message Blah timeout
:at [pulsar_test.core$test_gen_server$reify__25316 handle_timeout form-init1441529214695591630.clj 27]}]
:trace
[[pulsar_test.core$test_gen_server$reify__25316 handle_timeout form-init1441529214695591630.clj 27]
[co.paralleluniverse.pulsar.actors$Server$reify__25164 handleTimeout actors.clj 665]
[co.paralleluniverse.actors.behaviors.ServerActor handleTimeout ServerActor.java 360]
[co.paralleluniverse.actors.behaviors.ServerActor behavior ServerActor.java 199]
[co.paralleluniverse.actors.behaviors.BehaviorActor doRun BehaviorActor.java 293]
[co.paralleluniverse.actors.behaviors.BehaviorActor doRun BehaviorActor.java 36]
[co.paralleluniverse.actors.Actor run0 Actor.java 691]
[co.paralleluniverse.actors.ActorRunner run ActorRunner.java 51]
[co.paralleluniverse.fibers.Fiber run Fiber.java 1072]
[co.paralleluniverse.fibers.Fiber run1 Fiber.java 1067]
[co.paralleluniverse.fibers.Fiber
exec Fiber.java 767]
[co.paralleluniverse.fibers.FiberForkJoinScheduler$FiberForkJoinTask exec1 FiberForkJoinScheduler.java 266]
[co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask doExec ParkableForkJoinTask.java 117]
[co.paralleluniverse.concurrent.forkjoin.ParkableForkJoinTask exec ParkableForkJoinTask.java 74]
[jsr166e.ForkJoinTask doExec ForkJoinTask.java 261]
[jsr166e.ForkJoinPool$WorkQueue runTask ForkJoinPool.java 988]
[jsr166e.ForkJoinPool runWorker ForkJoinPool.java 1628]
[jsr166e.ForkJoinWorkerThread run ForkJoinWorkerThread.java 107]]}
Stopped Server.
pulsar-test.core>
@pron, @circlespainter — Apologies for tagging you guys directly, but this issue is blocking my application.
Could you please take a look. I may very well be using gen-server and/or supervisor incorrectly, but it'd be helpful to know.
A minimal example that reproduces the issue is attached to the original post.
Thank you.
I'll take a look shortly, and respond within the week.
Found the problem. I'll push a fix tomorrow.
That's great. Thank you.
Alright, you'll need to use 0.7.7-SNAPSHOT
.
Instead of (test-gen-server "test/test-server")
write:
(actor-builder test-gen-server "test/test-server")
or
(actor-builder #(test-gen-server "test/test-server"))
to tell the supervisor that this is a function that (re)creates the actor.