siddhi-io/distribution

Events out of order during distributed deployment recovery

suhothayan opened this issue · 2 comments

Description:
Events are out of order during distributed deployment recovery (replaying data from NATS Streaming Server).

Better if we know the reason why, and fix this if it will not introduce performance issues.

2020-01-03 15:30:29 INFO  LoggerService:42 - {event={name=Cake, amount=380.0}}
2020-01-03 15:30:30 INFO  LoggerService:42 - {event={name=Cake, amount=400.0}}
2020-01-03 15:30:31 INFO  LoggerService:42 - {event={name=Cake, amount=420.0}}
2020-01-03 15:30:31 INFO  LoggerService:42 - {event={name=Cake, amount=440.0}}
2020-01-03 15:30:45 INFO  LoggerService:42 - {event={name=Cake, amount=460.0}}
2020-01-03 15:30:46 INFO  LoggerService:42 - {event={name=Cake, amount=480.0}}
2020-01-03 15:30:48 INFO  LoggerService:42 - {event={name=Cake, amount=500.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=380.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=400.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=440.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=480.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=420.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=500.0}}
2020-01-03 15:30:55 INFO  LoggerService:42 - {event={name=Cake, amount=460.0}}
2020-01-03 15:31:51 INFO  LoggerService:42 - {event={name=Cake, amount=520.0}}
2020-01-03 15:32:09 INFO  LoggerService:42 - {event={name=Cake, amount=540.0}}
2020-01-03 15:32:10 INFO  LoggerService:42 - {event={name=Cake, amount=560.0}}

I tried to reproduce the scenario with a testcase as in siddhi-io/siddhi-io-nats#36. But the events are retrieved in the correct order at the Source. Furthermore, I encountered a bug related to duplicating an event during persisting and restoring. It was fixed in the above PR itself.

Tried the same by publishing through a Nats Sink instead of NatsClient. Couldn't reproduce the out-of-order scenario.

Will try this on distributed deployment and update the thread