Proposal to deprecate `term_on_timeout: false`
mtsmfm opened this issue · 7 comments
Hi, thank you for creating and maintaining useful gem.
I've read risks.md and timeout related PRs on Rails like rails/rails#17607 and understand timeout can break application state completely.
I've read Rails' codes and found critical case so I'd like to share and propose deprecating reusing timed out Ruby process.
Following example shows timeout can break even User.count
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "rails", "6.0.3"
gem "sqlite3"
gem "rack-timeout", require: "rack/timeout/base"
end
require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR
ActiveRecord::Schema.define do
create_table :users, force: true do |t|
t.string :name
end
end
class User < ActiveRecord::Base
scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end
User.create!(name: "foo")
User.create!(name: "bar")
class TestApp < Rails::Application
config.root = __dir__
config.hosts << "example.org"
config.session_store :cookie_store, key: "cookie_store_key"
secrets.secret_key_base = "secret_key_base"
config.logger = ActiveRecord::Base.logger
Rails.logger = config.logger
config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1
config.exceptions_app = ->(*) { [500, {}, ['']] }
routes.draw do
get "/" => "test#index"
get "/timeout" => "test#timeout"
end
end
class TestController < ActionController::Base
include Rails.application.routes.url_helpers
def index
render plain: User.count
end
def timeout
loop do
User.where(name: "foo").sleep_a_while
end
end
end
require "minitest/autorun"
class BugTest < Minitest::Test
include Rack::Test::Methods
def test_index
get "/"
# Make sure User.count is 2
assert_equal "2", last_response.body
until last_response.body == "1"
get "/timeout"
get "/"
end
get "/"
# User.count must be always 2 but it's now 1
assert_equal "2", last_response.body
end
private
def app
Rails.application
end
end
I think in most cases we use this gem with Rails.
The above example shows ActiveRecord can be completely broken after timeout.
In addition, it doesn't throw any error so it's very difficult to be aware this problem.
We already have well written document risks.md though, I'd like to propose deprecating this.
What do you think?
Hello and thank you for your thoughts and your test app!
I believe the default behavior is what people expect from historical versions of Rack Timeout: just raise an exception but continue on. I believe the TERM
behavior is something that people should opt-into as it's highly dependent on the nature of your production environment. Additionally, while the application may enter a bad state when using Rails + ActiveRecord, the middleware does not know what you are using -- your app might only be making network calls, for instance. If you find that you are aborting database calls with Rack Timeout that might be a good indication that you should try setting a database-level timeout, like statement_timeout
in Postgres.
@wuputah Thank you for responding.
I understand it's difficult to change the default behavior.
What do you think about showing warn if term_on_timeout
is false?
As you can see on above example, active record scope which used just before timeout is injected into all other scopes.
User.count
will be User.where(name: "foo").count
even it's written as User.count
.
I think it's so terrible and really difficult to be aware.
@mtsmfm rails/rails#41356 was recently merged, does that change anything for you?
I'm not sure about warning on term_on_timeout: false
, what should it say? That you should opt-in to the TERM
behavior?
Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers. Maybe rack-timeout should have a warning by default, "Are you really sure you need to use rack-timeout and do you understand the consequences?" Something similar to parallel --citation
. Hehe.
rails/rails#41356 was recently merged, does that change anything for you?
No, the test I showed still fails.
# Running:
F
Failure:
BugTest#test_index [bar.rb:84]:
--- expected
+++ actual
@@ -1 +1,3 @@
-"2"
+# encoding: ASCII-8BIT
+# valid: true
+"1"
Entire code (just changes gem "rails" line)
require "bundler/inline"
gemfile do
source "https://rubygems.org"
gem "rails", github: "rails/rails", ref: "main"
gem "sqlite3"
gem "rack-timeout", require: "rack/timeout/base"
end
require "rack/test"
require "active_record/railtie"
require "action_controller/railtie"
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(File::NULL)
ActiveRecord::Base.logger.level = Logger::ERROR
ActiveRecord::Schema.define do
create_table :users, force: true do |t|
t.string :name
end
end
class User < ActiveRecord::Base
scope :sleep_a_while, -> { all.tap { sleep(rand / 1000000.0) } }
end
User.create!(name: "foo")
User.create!(name: "bar")
class TestApp < Rails::Application
config.root = __dir__
config.hosts << "example.org"
config.session_store :cookie_store, key: "cookie_store_key"
secrets.secret_key_base = "secret_key_base"
config.logger = ActiveRecord::Base.logger
Rails.logger = config.logger
config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 0.1
config.exceptions_app = ->(*) { [500, {}, ['']] }
routes.draw do
get "/" => "test#index"
get "/timeout" => "test#timeout"
end
end
class TestController < ActionController::Base
include Rails.application.routes.url_helpers
def index
render plain: User.count
end
def timeout
loop do
User.where(name: "foo").sleep_a_while
end
end
end
require "minitest/autorun"
class BugTest < Minitest::Test
include Rack::Test::Methods
def test_index
get "/"
# Make sure User.count is 2
assert_equal "2", last_response.body
until last_response.body == "1"
get "/timeout"
get "/"
end
get "/"
# User.count must be always 2 but it's now 1
assert_equal "2", last_response.body
end
private
def app
Rails.application
end
end
Why and how rack-timeout is used in an application, should be a well thought out decision by the application developers.
Agree. But I hope the library uses safer option by default if it has.
In addition, if my understanding is correct:
- Puma doesn't have request timeout mechanism so rack-timeout is de facto standard gem to stop not only DB query but also CPU heavy processing
- Heroku had recommended to use rack-timeout without notice
I know application developers must investigate why a request timed out.
But we can't make sure every endpoint can respond fast enough before deploying.
rack-timeout can work as a safety valve I think.
rack-timeout can't cause any problem if we use application server with multi process mode and term_on_timeout: true
.
I just hope this gem is safe by default or has a caution if it's not safe by default despite it has safe option.
rack-timeout can't cause any problem if we use application server with multi process mode and
term_on_timeout: true
Agree, but this gem can't know if your app server is using multiple processes...
As an example, Puma is not doing that by default – so then it wouldn't be a safe default IMHO
Puma is not doing that by default
Yes, and it's also difficult to change I guess.
That's why I proposed to show warning on term_on_timeout: false
.
I think injecting unexpected scope can be a possibility of vulnerability stuff.
# Might be BlockedUser.where(user_id: timed_out_request_user_id).where(user_id: current_user.id).count after time out
if BlockedUser.where(user_id: current_user.id).count == 0
# Run something
end
Of course I respect your, rack-timeout and puma maintainer's opinion.
I just want you to know the possibility and I felt it's underestimated.