RBMHTechnology/eventuate

EventsourcedProcessor: write batch size exceeding partition size

Closed this issue · 9 comments

Hi,

I just ran into the following exception with one of my EventsourcedProcessors (a StatefulProcessor to be specific). During recovery the processor crashes with a write failure due to the write batch size exceeding the configured partition size (here 16000).

Caused by: java.lang.IllegalArgumentException: requirement failed: write batch size (28611) must not be greater than maximum partition size (16000)
    at scala.Predef$.require(Predef.scala:224)
    at com.rbmhtechnology.eventuate.log.EventLog$.com$rbmhtechnology$eventuate$log$EventLog$$adjustSequenceNr(EventLog.scala:675)
    at com.rbmhtechnology.eventuate.log.EventLog$$anonfun$writeBatches$1.apply(EventLog.scala:558)
    at com.rbmhtechnology.eventuate.log.EventLog$$anonfun$writeBatches$1.apply(EventLog.scala:558)
    at scala.util.Try$.apply(Try.scala:192)
    at com.rbmhtechnology.eventuate.log.EventLog.writeBatches(EventLog.scala:558)
    at com.rbmhtechnology.eventuate.log.EventLog.com$rbmhtechnology$eventuate$log$EventLog$$processReplicationWrites(EventLog.scala:528)
    at com.rbmhtechnology.eventuate.log.EventLog$$anonfun$com$rbmhtechnology$eventuate$log$EventLog$$initialized$1.applyOrElse(EventLog.scala:433)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at com.rbmhtechnology.eventuate.log.EventLog.aroundReceive(EventLog.scala:247)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I think the EventsourcedProcessors write() requests a ReplicationWrite without respecting any batch sizes at all - not sure if that's the real issue though.

Cheers,
Gregor

Write batch sizes are limited by eventuate.log.replay-batch-size during recovery but if you generate many processed events per replayed event you may still run into this (known) issue. An already recovered processor that is under high load may also cause this issue.

Processing via the Akka Streams API will be the recommended way in to future to process events. It is currently work in progress and addresses all these issues. Expect a merge to master within the next weeks.

In any case, I'll add the bug label here as this should be fixed regardless whether the Akka Streams based processor API is in place or not. Thanks for reporting!

@kongo2002 this is now fixed in #336. Please review and give it a try, thanks!

@krasserm great news - thanks a bunch!

@krasserm I just retried with the latest snapshot version and it works - great! You now have to look out for the correct write-batch-size of course.

Glad to hear that 😃 An alternative could be to allow a processor to generate more than write-batch-size events per input event at the risk of causing the same error you reported. But in this case this is a decision taken by the processor. For all other cases i.e. if a processor only generates less than write-batch-size events per input event, the aggregated batch size will never exceed write-batch-size. If this is a better alternative for you, I'm fine switching to it. WDYT?

Actually, an EventsourcedActor could also generate an event batch larger than write-batch-size if it persists more than write-batch-size events per command. I think we should either allow a processor to do that as well or restrict EventsourcedActor to not persisting more than write-batch-size per command.

My current preference is to allow a processor to generate more than write-batch-size events per input event as this still has a (good) chance to succeed. In any case both, EventsourcedProcessor and EventsourcedActor, should have the same batch size limitation policy (which is currently not the case). Thoughts?

Yes, I think you are right. Both EventsourcedProcessor and EventsourcedActor should behave the same regarding batch size limitations.
Then I would suggest aligning the processor to the EventsourcedActor's behavior while logging a warning as soon as the write-batch-size is exceeded.

Ok, expect an update soon. Will also cover things discussed here and the following comments.

@kongo2002 #341 covers these changes. Should be in master tomorrow I think.