memory leak in 2.7.6 (and net-http-persistent 3.0.0)?
maia opened this issue ยท 8 comments
I'm experiencing a reproducible memory leak when updating mechanize from 2.7.5 to 2.7.6 and net-http-persistent from 2.9.4 to 3.0.0, within a few hours my worker memory usage climbs from a stable ~250MB (for the past months) by about 20MB per hour when updating these gems:
$ bundle outdated --strict
...
Outdated gems included in the bundle:
* mechanize (newest 2.7.6, installed 2.7.5) in groups "default"
* net-http-persistent (newest 3.0.0, installed 2.9.4)
$ bundle update mechanize net-http-persistent
Here's how I use mechanize in my worker:
class UrlQuery
USER_AGENT = 'Mac Safari'
TIMEOUT = 5.0 # seconds
EXCEPTIONS = [
Errno::ECONNREFUSED, Errno::ECONNRESET, Errno::EHOSTUNREACH, Errno::EINVAL,
Errno::ENETUNREACH, Errno::ETIMEDOUT, Mechanize::Error,
Mechanize::RedirectLimitReachedError, Mechanize::ResponseCodeError,
Mechanize::UnauthorizedError, Net::HTTP::Persistent::Error, Net::HTTPFatalError,
Net::HTTPInternalServerError, Net::HTTPMethodNotAllowed, Net::HTTPServerException,
Net::HTTPServiceUnavailable, Net::OpenTimeout, Net::ReadTimeout,
OpenSSL::SSL::SSLError, SocketError, Timeout::Error, URI::InvalidURIError
].freeze
def initialize(url)
@url = url
end
def call
page = agent.head(@url)
uri = page.uri.to_s
content_type = page.response['content-type']
content_length = page.response['content-length'].to_i
[uri, nil, content_type, content_length]
rescue *EXCEPTIONS => e
[nil, "#{e.class}: #{e.message}"]
rescue StandardError => e
Rollbar.error(e)
[nil, "#{e.class}: #{e.message}"]
end
private
def agent
Mechanize.new do |agent|
agent.user_agent_alias = USER_AGENT
agent.open_timeout = TIMEOUT
agent.read_timeout = TIMEOUT
agent.verify_mode = OpenSSL::SSL::VERIFY_PEER
end
end
end
All dependencies are updated to the most recent version:
domain_name (0.5.20180417)
http-cookie (1.0.3)
mime-types (3.1)
net-http-digest_auth (1.4.1)
net-http-persistent (3.0.0)
nokogiri (1.8.4)
ntlm-http (0.1.1)
webrobots (0.1.2)
connection_pool (2.2.2)
The app is using rails 5.1.6 and ruby 2.5.0 and is running on a heroku dyno, the worker job is called every 10 minutes and each time parses some dozens URLs.
I can confirm this: drbrain/net-http-persistent#96
Downgrading to net-http-persistent
2.9.4 have fixed the issue. Maybe mechanize shouldn't allow 3.0 too.
As net-http-persistent
3.0.0 is about two years old and there are no open issues there regarding memory leaks, it seems there's something in mechanize improperly using 3.0.0, maybe http.shutdown
as required is not properly called by mechanize?
@maia mechanize instances are being freed so there shouldn't be any references to net-http-persistent
, but as in 3.0 version it keeps a global connection pool, which is likely the reason of the leak.
I'm able to reproduce the memory problem with the following code:
require 'get_process_mem'
require 'mechanize'
puts "Running net-http-persistent: #{Net::HTTP::Persistent::VERSION}"
def run
100.times do
agent = Mechanize.new
agent.get "https://google.com"
agent.shutdown
end
end
mem = GetProcessMem.new
puts "Memory before running: #{mem.mb} MB"
run
mem = GetProcessMem.new
puts "Memory after running: #{mem.mb} MB"
Results:
Running net-http-persistent: 2.9.4
Memory before running: 30.0234375 MB
Memory after running: 74.41796875 MB
Running net-http-persistent: 3.0.0
Memory before running: 31.80078125 MB
Memory after running: 138.34375 MB
I'm running mechanize 2.7.6
and ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]
on macOS Mojave 10.14.2
.
I was able to bisect this to drbrain/net-http-persistent@5bae22e
I think this patch fixes it: drbrain/net-http-persistent#98
I shipped net-http-persistent version 3.0.1, and it should clear this up for the most part. I think there is the possibility of a small leak of string keys in the main thread, but I'll need to do more investigation to fix that. v3.0.1 should clear it up mainly.