sky-uk/kafka-message-scheduler

Ask time out on startup

michaelmcfadyensky opened this issue · 4 comments

We are seeing the following error after starting up KMS with a topic of 6.4 million messages. Quickly after, the service is killed by the TerminatorActor.

alto-kms ERROR [kafka-message-scheduler-akka.actor.default-dispatcher-64] com.sky.kms.actors.SchedulingActor - Reader stream has died
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://kafka-message-scheduler/user/scheduling-actor#1845849227]] after [5000 ms]. Message of type [com.sky.kms.actors.SchedulingActor$Initialised$]. A typical reason for `AskTimeoutException` is that the recipient actor didn't send a reply.
	at akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:635)
	at akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:650)
	at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205)
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:870)
	at scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)
	at scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:868)
	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
	at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
	at java.lang.Thread.run(Thread.java:748)
[2019-12-16 16:28:23,095] alto-kms ERROR [kafka-message-scheduler-akka.actor.default-dispatcher-64] com.sky.kms.actors.TerminatorActor - Actor[akka://kafka-message-scheduler/user/scheduling-actor#1845849227] stopped. Shutting down

I noticed we have the following property scheduler.reader.timeouts.initialisation. Is this linked to the above exception?

Hi @michaelmcfadyensky could you add some configuration details and what version number you are using please?

I think we have noticed an issue with setting that config too low before in environments where kafka seems to be quite slow or we have a lot of schedules and have increased it to 2 hours (probably a bit overkill). I think this timeout only applies when the KMS is initialising at startup so it should be ok to increase quite significantly.

I'm assuming its an old version, unless they are explicitly configuring 5 seconds for that timeout (default is longer than that)

We're currently using 0.18.0 of the docker image. Currently, trying to upgrade to the latest version to see if it resolves this issue.

Updating to the latest version (0.22.0) resolved this issue. Thanks for the comments and suggestions.