Possible incorrect expiration of requests
mikebaldry opened this issue · 11 comments
Our production system just had a spike in expirations, but I can't work out why it could happen. The logs from heroku router and the log from Rack::Timeout
do not agree.
cut down logs for simplification:
2018-08-16T08:20:13.833953+00:00 heroku router - - at=info method=GET request_id=9ece97f8-d995-47b6-9ce5-fd9b0f0415c3 dyno=web.1 connect=0ms service=4240ms status=500
2018-08-16T08:20:13.832207+00:00 app web.1 - - source=rack-timeout id=9ece97f8-d995-47b6-9ce5-fd9b0f0415c3 wait=52294ms timeout=30000ms state=expired
Heroku router says that the connect time was 0ms, so there was no wait at all and indeed, the request was finished in 4.2s. Rack::Timeout
thinks the request has been waiting 52s and kills it because it is older than 30s.
Can anyone shed some light on why this might happen?
Thanks
I'm seeing the exact same thing. It actually seems like rack-timeout thinks it's killing every single request to my app, some as quick as 40ms (they still seem to return a response successfully)
My two suspicions are something about where rack-timeout is being inserted into your middleware stack, or a thread-safety bug.
Could you please post what version of rack-timeout, Rails, Ruby, and web server (i.e. puma) you are using, your server's threading settings (if applicable), and ideally the output from rake middleware
? Anything we can do to correlate similar reports would help.
Rails 5.2.1
Ruby 2.5.1
Puma 3.12.0
rack-timeout 0.5.1
puma.rb excerpts:
threads 5, 5
workers 2
preload_app!
production.rb excerpt:
config.middleware.insert_before Rack::Runtime, Rack::Timeout, service_timeout: 29 # seconds
$ RAILS_ENV=production rake middleware
use ActionDispatch::SSL
use Rack::Sendfile
use ActionDispatch::Executor
use Rack::Timeout
use ActiveSupport::Cache::Strategy::LocalCache::Middleware
use Rack::Runtime
use Rack::MethodOverride
use ActionDispatch::RequestId
use RequestStore::Middleware
use ActionDispatch::RemoteIp
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use Bugsnag::Rack
use ActionDispatch::Callbacks
use ActionDispatch::Cookies
use ActionDispatch::Session::CookieStore
use ActionDispatch::Flash
use ActionDispatch::ContentSecurityPolicy::Middleware
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Rack::TempfileReaper
use Warden::Manager
use OmniAuth::Strategies::Facebook
use OmniAuth::Strategies::StripeConnect
run Application.routes
Thanks. I came up with one other explanation, which is a clock/time drift error, possibly something is going wrong with ntp on Heroku, causing the clock on the routers and the runtimes to be (significantly) different. However, it remains peculiar that the request is finishing and then reporting an error via rack-timeout.
Here's a full log trace of an example from my server:
Aug 14 19:18:30 heroku/router: at=info method=GET path="/" request_id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 fwd="107.14.57.131,172.69.33.147" dyno=web.1 connect=1ms service=54ms status=200 bytes=16452 protocol=https
Aug 14 19:18:30 app/web.1: source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms state=ready
Aug 14 19:18:30 app/web.1: source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms service=0ms state=active
Aug 14 19:18:30 app/web.1: [70901d76-33b0-4ab4-97e9-bd18dc6951f0] method=GET path=/ format=html controller=ContentController action=home status=200 duration=47.03 view=46.53 db=0.00
Aug 14 19:18:30 app/web.1: source=rack-timeout id=70901d76-33b0-4ab4-97e9-bd18dc6951f0 wait=2ms timeout=29000ms service=51ms state=completed
There's no timeout there from rack-timeout, it's just logging information. You can turn this off by changing the logging level to warn.
Good to know! I guess I was completely confused here then. Sorry for hijacking your issue, @mikebaldry
@mikebaldry if you could provide more debugging info similar to the above it would be most helpful. If you're seeing this frequently and can afford the extra logging data, you might consider increasing the logging level for rack-timeout.
Sorry, I did not get any notifications from this issue! I just come back on the off chance it had been looked at and I hadn't been notified :)
use Rack::Cors
use Rack::Sendfile
use ActionDispatch::Static
use Rack::Lock
use Rack::Attack
use DisablePageCacheMiddleware
use TrackContentTypeMiddleware
use AcmeChallengeMiddleware
use ActiveMarketMiddleware
use RedirectMiddleware
use #<ActiveSupport::Cache::Strategy::LocalCache::Middleware:0x00007fe460f4f690>
use Rack::Runtime
use Rack::MethodOverride
use OutcomeLogger::Middleware
use ActionDispatch::RequestId
use RequestStore::Middleware
use Rack::Timeout
use Rails::Rack::Logger
use ActionDispatch::ShowExceptions
use ActionDispatch::DebugExceptions
use Rollbar::Middleware::Rails::RollbarMiddleware
use ActionDispatch::RemoteIp
use ActionDispatch::Callbacks
use ActiveRecord::ConnectionAdapters::ConnectionManagement
use ActiveRecord::QueryCache
use ActionDispatch::Cookies
use ActionDispatch::Session::ActiveRecordStore
use ActionDispatch::Flash
use ActionDispatch::ParamsParser
use Rack::Head
use Rack::ConditionalGet
use Rack::ETag
use Warden::Manager
use RequestIdMiddleware
use P3P::Middleware
use Rack::SslEnforcer
run Buyapowa::Application.routes
Ruby 2.5.1
Rails 4.2.10
Puma 3.12.0
Rack::Timeout 0.5.1
No puma config, just bundle exec puma -t 5:5 -p $PORT
No rack-timeout config, just pulled in by Gemfile
Many thanks
I'm pretty stumped here. There's a fair bit of middleware here I'm unfamiliar with, so that could be a possible source of the issue, but by no means conclusive.
I'm assuming you're still seeing this issue -- can you tell me, how often does it occur? Do you also see "valid" normal timeouts as well as these erroneous ones?
Pending further information, I don't suspect there's a bug in rack-timeout here.