zombocom/rack-timeout

Proposal to deprecate `term_on_timeout: false`

mtsmfm opened this issue · 7 comments

Hi, thank you for creating and maintaining useful gem.
I've read risks.md and timeout related PRs on Rails like rails/rails#17607 and understand timeout can break application state completely.
I've read Rails' codes and found critical case so I'd like to share and propose deprecating reusing timed out Ruby process.

Following example shows timeout can break even User.count

require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "rails", "6.0.3"
  gem "sqlite3"
  gem "rack-timeout", require: "rack/timeout/base"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.string :name
  end
end

class User < ActiveRecord::Base
  scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end

User.create!(name: "foo")
User.create!(name: "bar")

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.session_store :cookie_store, key: "cookie_store_key"
  secrets.secret_key_base = "secret_key_base"

  config.logger = ActiveRecord::Base.logger
  Rails.logger  = config.logger

  config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1

  config.exceptions_app = ->(*) { [500, {}, ['']] }

  routes.draw do
    get "/" => "test#index"
    get "/timeout" => "test#timeout"
  end
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: User.count
  end

  def timeout
    loop do
      User.where(name: "foo").sleep_a_while
    end
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods

  def test_index
    get "/"

    # Make sure User.count is 2
    assert_equal "2", last_response.body

    until last_response.body == "1"
      get "/timeout"

      get "/"
    end

    get "/"
    # User.count must be always 2 but it's now 1
    assert_equal "2", last_response.body
  end

  private
    def app
      Rails.application
    end
end

I think in most cases we use this gem with Rails.
The above example shows ActiveRecord can be completely broken after timeout.
In addition, it doesn't throw any error so it's very difficult to be aware this problem.
We already have well written document risks.md though, I'd like to propose deprecating this.

What do you think?

Hello and thank you for your thoughts and your test app!

I believe the default behavior is what people expect from historical versions of Rack Timeout: just raise an exception but continue on. I believe the TERM behavior is something that people should opt-into as it's highly dependent on the nature of your production environment. Additionally, while the application may enter a bad state when using Rails + ActiveRecord, the middleware does not know what you are using -- your app might only be making network calls, for instance. If you find that you are aborting database calls with Rack Timeout that might be a good indication that you should try setting a database-level timeout, like statement_timeout in Postgres.

@wuputah Thank you for responding.
I understand it's difficult to change the default behavior.

What do you think about showing warn if term_on_timeout is false?
As you can see on above example, active record scope which used just before timeout is injected into all other scopes.
User.count will be User.where(name: "foo").count even it's written as User.count.
I think it's so terrible and really difficult to be aware.

@mtsmfm rails/rails#41356 was recently merged, does that change anything for you?

I'm not sure about warning on term_on_timeout: false, what should it say? That you should opt-in to the TERM behavior?

Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers. Maybe rack-timeout should have a warning by default, "Are you really sure you need to use rack-timeout and do you understand the consequences?" Something similar to parallel --citation. Hehe.

@dentarg

rails/rails#41356 was recently merged, does that change anything for you?

No, the test I showed still fails.

# Running:

F

Failure:
BugTest#test_index [bar.rb:84]:
--- expected
+++ actual
@@ -1 +1,3 @@
-"2"
+# encoding: ASCII-8BIT
+#    valid: true
+"1"
Entire code (just changes gem "rails" line)
require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "rails", github: "rails/rails", ref: "main"
  gem "sqlite3"
  gem "rack-timeout", require: "rack/timeout/base"
end

require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.string :name
  end
end

class User < ActiveRecord::Base
  scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end

User.create!(name: "foo")
User.create!(name: "bar")

class TestApp < Rails::Application
  config.root = __dir__
  config.hosts << "example.org"
  config.session_store :cookie_store, key: "cookie_store_key"
  secrets.secret_key_base = "secret_key_base"

  config.logger = ActiveRecord::Base.logger
  Rails.logger  = config.logger

  config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1

  config.exceptions_app = ->(*) { [500, {}, ['']] }

  routes.draw do
    get "/" => "test#index"
    get "/timeout" => "test#timeout"
  end
end

class TestController < ActionController::Base
  include Rails.application.routes.url_helpers

  def index
    render plain: User.count
  end

  def timeout
    loop do
      User.where(name: "foo").sleep_a_while
    end
  end
end

require "minitest/autorun"

class BugTest < Minitest::Test
  include Rack::Test::Methods

  def test_index
    get "/"

    # Make sure User.count is 2
    assert_equal "2", last_response.body

    until last_response.body == "1"
      get "/timeout"

      get "/"
    end

    get "/"
    # User.count must be always 2 but it's now 1
    assert_equal "2", last_response.body
  end

  private
    def app
      Rails.application
    end
end

Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers.

Agree. But I hope the library uses safer option by default if it has.
In addition, if my understanding is correct:

I know application developers must investigate why a request timed out.
But we can't make sure every endpoint can respond fast enough before deploying.
rack-timeout can work as a safety valve I think.

rack-timeout can't cause any problem if we use application server with multi process mode and term_on_timeout: true.
I just hope this gem is safe by default or has a caution if it's not safe by default despite it has safe option.

rack-timeout can't cause any problem if we use application server with multi process mode and term_on_timeout: true

Agree, but this gem can't know if your app server is using multiple processes...

As an example, Puma is not doing that by default – so then it wouldn't be a safe default IMHO

Puma is not doing that by default

Yes, and it's also difficult to change I guess.
That's why I proposed to show warning on term_on_timeout: false.

I think injecting unexpected scope can be a possibility of vulnerability stuff.

# Might be BlockedUser.where(user_id: timed_out_request_user_id).where(user_id: current_user.id).count after time out
if BlockedUser.where(user_id: current_user.id).count == 0
  # Run something
end

Of course I respect your, rack-timeout and puma maintainer's opinion.
I just want you to know the possibility and I felt it's underestimated.