Possible incorrect expiration of requests

Question

Possible incorrect expiration of requests

mikebaldry opened this issue 6 years ago · 11 comments

Our production system just had a spike in expirations, but I can't work out why it could happen. The logs from heroku router and the log from Rack::Timeout do not agree.

cut down logs for simplification:

2018-08-16T08:20:13.833953+00:00 heroku router - - at=info method=GET request_id=9ece97f8-d995-47b6-9ce5-fd9b0f0415c3 dyno=web.1 connect=0ms service=4240ms status=500
2018-08-16T08:20:13.832207+00:00 app web.1 - - source=rack-timeout id=9ece97f8-d995-47b6-9ce5-fd9b0f0415c3 wait=52294ms timeout=30000ms state=expired

Heroku router says that the connect time was 0ms, so there was no wait at all and indeed, the request was finished in 4.2s. Rack::Timeout thinks the request has been waiting 52s and kills it because it is older than 30s.

Can anyone shed some light on why this might happen?

Thanks

Answer 1 · 2018-08-16T17:11:37.000Z

I'm seeing the exact same thing. It actually seems like rack-timeout thinks it's killing every single request to my app, some as quick as 40ms (they still seem to return a response successfully)

Answer 2 · 2018-08-16T19:02:50.000Z

My two suspicions are something about where rack-timeout is being inserted into your middleware stack, or a thread-safety bug.

Could you please post what version of rack-timeout, Rails, Ruby, and web server (i.e. puma) you are using, your server's threading settings (if applicable), and ideally the output from rake middleware? Anything we can do to correlate similar reports would help.

Answer 3 · 2018-08-16T19:17:24.000Z

Rails 5.2.1
Ruby 2.5.1
Puma 3.12.0
rack-timeout 0.5.1

puma.rb excerpts:

threads 5, 5
workers 2
preload_app!

production.rb excerpt:

config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 29 # seconds

$ RAILS_ENV=production rake middleware 
use ActionDispatch::SSL
use Rack::Sendfile
use ActionDispatch::Executor
use Rack::Timeout
use ActiveSupport::Cache::Strategy::LocalCache::Middleware
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use RequestStore::Middleware
use ActionDispatch::RemoteIp
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use Bugsnag::Rack
use ActionDispatch::Callbacks
use ActionDispatch::Cookies
use ActionDispatch::Session::CookieStore
use ActionDispatch::Flash
use ActionDispatch::ContentSecurityPolicy::Middleware
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Rack::TempfileReaper
use Warden::Manager
use OmniAuth::Strategies::Facebook
use OmniAuth::Strategies::StripeConnect
run Application.routes

Answer 4 · 2018-08-16T23:19:41.000Z

Thanks. I came up with one other explanation, which is a clock/time drift error, possibly something is going wrong with ntp on Heroku, causing the clock on the routers and the runtimes to be (significantly) different. However, it remains peculiar that the request is finishing and then reporting an error via rack-timeout.

Answer 5 · 2018-08-16T23:46:52.000Z

Here's a full log trace of an example from my server:

Aug 14 19:18:30 heroku/router:  at=info method=GET path="/" request_id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 fwd="107.14.57.131,172.69.33.147" dyno=web.1 connect=1ms service=54ms status=200 bytes=16452 protocol=https 
Aug 14 19:18:30 app/web.1:  source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms state=ready 
Aug 14 19:18:30 app/web.1:  source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms service=0ms state=active 
Aug 14 19:18:30 app/web.1:  [70901d76-33b0-4ab4-97e9-bd18dc6951f0] method=GET path=/ format=html controller=ContentController action=home status=200 duration=47.03 view=46.53 db=0.00 
Aug 14 19:18:30 app/web.1:  source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms service=51ms state=completed

Answer 6 · 2018-08-17T00:50:21.000Z

There's no timeout there from rack-timeout, it's just logging information. You can turn this off by changing the logging level to warn.

Answer 7 · 2018-08-17T00:58:40.000Z

Good to know! I guess I was completely confused here then. Sorry for hijacking your issue, @mikebaldry

Answer 8 · 2018-08-17T21:57:31.000Z

@mikebaldry if you could provide more debugging info similar to the above it would be most helpful. If you're seeing this frequently and can afford the extra logging data, you might consider increasing the logging level for rack-timeout.

Answer 9 · 2018-08-23T12:12:48.000Z

Sorry, I did not get any notifications from this issue! I just come back on the off chance it had been looked at and I hadn't been notified :)

use Rack::Cors
use Rack::Sendfile
use ActionDispatch::Static
use Rack::Lock
use Rack::Attack
use DisablePageCacheMiddleware
use TrackContentTypeMiddleware
use AcmeChallengeMiddleware
use ActiveMarketMiddleware
use RedirectMiddleware
use #<ActiveSupport::Cache::Strategy::LocalCache::Middleware:0x00007fe460f4f690>
use Rack::Runtime
use Rack::MethodOverride
use OutcomeLogger::Middleware
use ActionDispatch::RequestId
use RequestStore::Middleware
use Rack::Timeout
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use Rollbar::Middleware::Rails::RollbarMiddleware
use ActionDispatch::RemoteIp
use ActionDispatch::Callbacks
use ActiveRecord::ConnectionAdapters::ConnectionManagement
use ActiveRecord::QueryCache
use ActionDispatch::Cookies
use ActionDispatch::Session::ActiveRecordStore
use ActionDispatch::Flash
use ActionDispatch::ParamsParser
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Warden::Manager
use RequestIdMiddleware
use P3P::Middleware
use Rack::SslEnforcer
run Buyapowa::Application.routes

Ruby 2.5.1
Rails 4.2.10
Puma 3.12.0
Rack::Timeout 0.5.1

No puma config, just bundle exec puma -t 5:5 -p $PORT
No rack-timeout config, just pulled in by Gemfile

Many thanks

Answer 10 · 2018-08-23T22:53:03.000Z

I'm pretty stumped here. There's a fair bit of middleware here I'm unfamiliar with, so that could be a possible source of the issue, but by no means conclusive.

I'm assuming you're still seeing this issue -- can you tell me, how often does it occur? Do you also see "valid" normal timeouts as well as these erroneous ones?

Answer 11 · 2018-10-01T19:54:42.000Z

Pending further information, I don't suspect there's a bug in rack-timeout here.