zilverline/sequent

commands scheduled in after_commit workflow blocks are not executed

Closed this issue · 6 comments

TreyE commented

Here's my current project: https://github.com/TreyE/entity_journal

I'm having an issue with execution of any events in an after_commit callback in my workflows.

For a good example, see lines 5-11 of https://github.com/TreyE/entity_journal/blob/master/app/sequent/entity_update_notifications/workflows.rb.

If I remove the "after_commit" block, my secondary command is fired with no issue.

Some test code that will let you see exactly what is happening by triggering the exact code path is:

Sequent.command_service.execute_commands ConsumeEntityUpdateNotification.new(aggregate_id: "abcde", event_body: "some xml")

If it also helps, I'm running the default rails setup with Puma on Ruby 2.6.6.

Update: I'm running Postgres 12. Wondering if it's related to: #216 ?

Update part 2 - looks like it never enters: https://github.com/zilverline/sequent/blob/master/lib/sequent/core/command_service.rb#L49 during the run of the after commit commands.

Update part 3 - what it comes down to is it looks like Sequent::Util.skip_if_already_processing is not re-entrant during recursive calls, which is what 'execute_service' in an after commit block does, because of the way the ActiveRecord transaction provider behaves with after_commit hooks.

I've got a simple solution with wraps it in a thread, dodging the thread local lock, but I don't like it. I'm seeing if I can do something else.

lvonk commented

Hi nice to see you are trying out Sequent! And yes this indeed seems to be a bug, good catch.

Looking at your code I do wonder why you chose to use after_commit to schedule new commands though. If you are going to do this synchronously you might as well leave the after_commit out. This way you don't have to deal with possible weird state in which the real transaction commits, but the after_commit fails.
The initial intent of after_commit was to be able to schedule background jobs in non postgres backed background workers like Sidekiq.

TreyE commented

I can understand the intention but if so it's not well documented that it's not supported.

In our case we are using sequent in an event sourcing system to process large amounts of highly unreliable data. In this case it is entirely possible we will get one event in and the transaction will fail. Under certain conditions, I want to keep the progress in the transactions that has been successful until the failure.

While generally speaking we want completely atomic transactions (and I see your point and understand your thinking), our data is very bad and it's very difficult to anticipate all the ways it can be wrong (3rd party vendor). As a result, I'd rather have a transaction go as far as it can and get 'stuck' in it's best possible state, so at least we know what happened, than have a transaction that repeatedly fails without reaching any steps successfully. Generally speaking we will track this through our read model projections. My understanding is read model projections are also subject to the same commit constraints as a failing transaction? I saw executing the commands in an after commit block as the best way to achieve this - can you recommend another (and it might be something as simple as Thread.new)?

lvonk commented

I can understand the intention but if so it's not well documented that it's not supported.

My apologies for the confusion: it should be supported, so this is indeed a bug. I was just wondering what the use case was.

My understanding is read model projections are also subject to the same commit constraints as a failing transaction?
Yes, when your transaction fails, the read model is not updated.

I understand your use case, we have the same challenge with various workflows, especially when keeping external data in sync with our system. We use background jobs (e.g. delayed job) for this. So in your case:

class EntityUpdateNotificationWorkflow <  Sequent::Workflow
  on EntityUpdateNotificationConsumed do |event|
    Delayed::Job.enqueue(MyJob.new(event))
  end
end

class MyJob
  def initialize(event)
    @event = event
  end

  def perform
     categorize_command = CategorizeEntityUpdateNotification.new({
      aggregate_id: event.aggregate_id
     })
    Sequent.command_service.execute_commands(categorize_command)
  end
end

Instead of delayed job could also of course use e.g. Que for this, or Sideqik. This does require some extra setup and you have to deal with eventual consistency, but you will get automatic retry and guaranteed transactions as return (so you are sure that the job is scheduled in the same transaction as the EntityUpdateNotificationConsumed occurred). Also the failing jobs are kept so we can do some digging why it failed and fix it if needed, or discard the jobs when we don't care.

So to summarize: this bug needs fixing :-), after_commit can be used for this, but perhaps using a background processor like delayed job is better suited because of the reasons stated above.

Hope this helps.

TreyE commented

Thanks for the update.

I've put together a PR, but I'd like you to consider it a suggestion rather than a solution - I'm not entirely pleased with how I executed the rentrancy (I might go back and change it).

I'm also fairly sure my solution suffers from a fatal flaw -
While executing my solution I didn't understand why and for what reason you would be executing a thread local lock, and I'm 95% sure even restructuring my solution won't address that. I'm pretty sure while this makes my use case pass, it leaves the code open to some pretty weird behaviour under multiply-recursive after_commit cases that you would have never intended.

Also I'm pretty sure I could have fixed the issue by wrapping my execution in a Thread.new.join without changing the gem.

If it's not too much trouble (don't go crazy on this, sure you are very busy), I would appreciate a brutally honest opinion on what I did and if it creates major flaws.

lvonk commented

I've put together a PR,

Thanks, I have provided some feedback on it.

I'm also fairly sure my solution suffers from a fatal flaw -

I don't think so, but I think it can be simplified by resetting the current thread local when the command_queue is empty after the transaction commits. I don't see any reason why this doesn't hold in multiple-recurvise scenarios, but we can create a test case for this of course :-)

Also I'm pretty sure I could have fixed the issue by wrapping my execution in a Thread.new.join without changing the gem.

Yes I expect this to work since sequent is thread safe. However this issue still needs fixing though since the documentation explicitly states that this is possible.

The only "flaw" I see in your approach is that when something fails in the after_commit it potentially lead to weird state and users might be confused that part of the transaction is done and some part not (since it is all happening in the foreground). But as you described in your application this is not a real problem, at least not for now.

lvonk commented

This issue is addressed in #249 @TreyE