zombocom/rack-timeout

AR transactions are not being rolled back when a timeout error is raised

ChrisCPO opened this issue · 5 comments

Hey, we are having an issue where large requests timeout, the database changes are handed inside of an ActiveRecord::Base.transaction, however the commits are still happening. There is no rollback. Which is causing issues because it seems like the request just hits a dead wall and commits then throws the timeout error. I have seen this in my development logs.

I have tried the following, rescue app wide rake timeous in the application controller. However this does not work. One odd thing is when this rescue code was not present and timeout time was set the timeout error did trigger a rollback for development, Normally this would be great but after testing on staging commits were still made, regardless. See Log 1 below.

#application controller
rescue_from Rack::Timeout::RequestTimeoutException do |timeout|
    if ActiveRecord::Base.connection.open_transactions > 0
      raise ActiveRecord::Rollback
    end
  end

Any error in raise should also cause a rollback. So why is a rollback not happening.
Is there a way to rollback the db in a transaction when a time error is raised?

Scenarios

I made a simple js endpoint for testing, which just updates a string attribute on a simple model we have.

  def update
    ActiveRecord::Base.transaction do
      @note = Note.find(note_id)
      @note.update(content: permitted_params[:content])

      sleep(permitted_params[:sleep].to_i)
    end

    render json: Note.find(note_id).attributes, status: 200
  end

Log 1 Development logs went like this, when the rescue in application controller was present.

This is currently how our staging and production behave regardless.

 Processing by LargeRequestTimeoutsController#update as */*
   Parameters: {"content"=>"using application transaction", "sleep"=>"35", "id"=>"133117"}
    (1.0ms)  BEGIN
source=rack-timeout id=c98a271671c194dfcd6dae6d453cab4a timeout=10000ms service=1002ms state=active
   Note Load (1.5ms)  SELECT  "notes".* FROM "notes" WHERE "notes"."id" = $1 LIMIT $2  [["id", 133117], ["LIMIT", 1]]
    (0.3ms)  BEGIN
   SQL (0.8ms)  UPDATE "notes" SET "content" = $1, "updated_at" = $2 WHERE "notes"."id" = $3  [["content", "using application transaction"], ["updated_at", "2018-08-22 21:00:29.923092"], ["id", 1111]]
    (3.2ms)  COMMIT
 timeout=10000ms service=2005ms state=active
 timeout=10000ms service=3004ms state=active
 timeout=10000ms service=4001ms state=active
 timeout=10000ms service=5001ms state=active
 timeout=10000ms service=6005ms state=active
 timeout=10000ms service=7001ms state=active
 timeout=10000ms service=8003ms state=active
 timeout=10000ms service=9006ms state=active
 timeout=10000ms service=10005ms state=active
 timeout=10000ms service=10006ms state=timed_out
    (0.7ms)  ROLLBACK
 Completed 500 Internal Server Error in 9065ms (ActiveRecord: 72.3ms)
Rack::Timeout::RequestTimeoutException (Request ran for longer than 10000ms):

Log 2 When the rescue in application controller was NOT present


 Processing by LargeRequestTimeoutsController#update as */*
   Parameters: {"content"=>"using application transaction", "sleep"=>"35", "id"=>"133117"}
    (1.0ms)  BEGIN
source=rack-timeout id=c98a271671c194dfcd6dae6d453cab4a timeout=10000ms service=1002ms state=active
   Note Load (1.5ms)  SELECT  "notes".* FROM "notes" WHERE "notes"."id" = $1 LIMIT $2  [["id", 133117], ["LIMIT", 1]]
    (0.3ms)  BEGIN
   SQL (0.8ms)  UPDATE "notes" SET "content" = $1, "updated_at" = $2 WHERE "notes"."id" = $3  [["content", "using application transaction"], ["updated_at", "2018-08-22 21:00:29.923092"], ["id", 1111]]
 timeout=10000ms service=2005ms state=active
 timeout=10000ms service=3004ms state=active
 timeout=10000ms service=4001ms state=active
 timeout=10000ms service=5001ms state=active
 timeout=10000ms service=6005ms state=active
 timeout=10000ms service=7001ms state=active
 timeout=10000ms service=8003ms state=active
 timeout=10000ms service=9006ms state=active
 timeout=10000ms service=10005ms state=active
 timeout=10000ms service=10006ms state=timed_out
    (0.7ms)  ROLLBACK
 Completed 500 Internal Server Error in 9065ms (ActiveRecord: 72.3ms)
Rack::Timeout::RequestTimeoutException (Request ran for longer than 10000ms):

My understanding so far is that ideally large requests should use polling/ background jobs etc., however sometimes things happen. So Rake-timeout should be a chance to handle a timeout error before Heroku's router 30 sec hard stop. Being set to timeout at ~28 sec.

The shortcomings of rack-timeout are well documented. Please see:
https://github.com/heroku/rack-timeout#further-documentation

Keep in mind it's possible to hit a timeout after the commit has finished. There's also the possibility of other weird threading behavior, roughly a race condition.

You should not need a rescue_from to rollback. This behavior happens automatically in Rails when an exception is raised during a transaction.

There is no way for rack-timeout to detect your app is in the middle of a transaction, and even if there was, I can't see any way to change or improve the behavior, since the goal of rack-timeout is to abort the transaction (both the web and database transaction) if the request is taking a long time.

@wuputah thanks for the quick response, so on our production server, we have the timeout set at 25 seconds.

You should not need a rescue_from to rollback. This behavior happens automatically in Rails when an exception is raised during a transaction.

This is what I thought to, but we are not seeing this. What we are seeing is all db changes up to the point when a Rack::Timeout::RequestTimeoutException is raised, are committed but no rollback is happening.

This issue is beyond my expertise and seems to be beyond the scope of rack-timeout, e.g. is either a Rails issue or something else in your app. I'll leave the issue open to see if other people might be able to offer advice, but unfortunately I don't think there's anything further I can advise.

Yea, taking a deeper dive, it is an issue with our App, our ActiveRecord is wrapping updates in their own transaction, so we have an issue with nested transactions, either through a gem or some obscure setting. This is not an issue with rake-timeout, thanks though.

Ah! That's good information for others, thanks for sharing!