Missing timeout for the CAPABILITY command
Opened this issue · 3 comments
Version: net-imap (0.4.8)
Ruby: 3.2.2
Description
I'm using net-imap
for fetching emails in some background jobs. There are rare cases where the email provider Microsoft Exchange returns no or an unexpected answer for the CAPABILITY
command. This blocks my background jobs for a long time, sometimes more than 30 minutes. At some point of time a Errno::ECONNRESET (Connection reset by peer)
exception is thrown, I assume when Microsoft Exchange closes the connection.
Backtrace
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:108:in `sleep'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:108:in `wait'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:108:in `wait_for_cond'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:108:in `wait'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:2701:in `get_tagged_response'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:2794:in `block in send_command'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `synchronize'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `mon_synchronize'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:2776:in `send_command'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:1077:in `block in capability'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `synchronize'
/home/foo/.rbenv/versions/3.2.2/lib/ruby/3.2.0/monitor.rb:202:in `mon_synchronize'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:1076:in `capability'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:987:in `capabilities'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:972:in `capable?'
/var/www/foo/shared/bundle/ruby/3.2.0/gems/net-imap-0.4.8/lib/net/imap.rb:1310:in `authenticate'
Feature request
Does it make sense, that one can configure a global timeout or a timeout for individual actions (e.g. https://github.com/ruby/net-imap/blob/v0.4.8/lib/net/imap.rb#L2689) within this gem? Or do you recommend e.g. wrapping gem code into own Timeout blocks?
Example
require 'ostruct'
require 'net/imap'
require 'socket'
require 'active_support/all'
class MailServer
CRLF = "\r\n".freeze
def initialize(port)
@server = TCPServer.new(port)
puts "Mail server started on port #{port}"
end
def start
loop do
Thread.start(@server.accept) do |client|
puts "Connection established with #{client.peeraddr[2]}"
client.puts "* OK The Microsoft Exchange IMAP4 service is ready.#{CRLF}"
loop do
request = client.gets&.chomp
break if request.blank?
case request
when 'RUBY0001 CAPABILITY'
client.puts "* CAPABILITY IMAP4 IMAP4rev1 AUTH=PLAIN AUTH=XOAUTH2 SASL-IR UIDPLUS ID UNSELECT CHILDREN IDLE NAMESPACE LITERAL+#{CRLF}"
client.puts "RUBY0001 OK CAPABILITY completed.#{CRLF}"
when 'RUBY0002 AUTHENTICATE XOAUTH2 dXNlcj11c2VyAWF1dGg9QmVhcmVyIHRlc3QxMjMBAQ=='
client.puts "RUBY0002 OK AUTHENTICATE completed.#{CRLF}"
else
client.puts "* ERR Unknown command #{request}" + CRLF
end
end
client.close
puts "Connection closed with #{client.peeraddr[2]}"
end
end
end
def stop
@server&.close
puts 'Mail server stopped'
end
end
server = MailServer.new(2000)
Thread.new { server.start }
sleep 1
Net::IMAP.new('localhost', port: 2000).authenticate('XOAUTH2', 'user', 'test123')
Dropping the line client.puts "RUBY0001 OK CAPABILITY completed.#{CRLF}"
will block the script for unlimited time.
Not responding to CAPABILITY
, one of the simplest and most important of all IMAP commands, something that should require no more than scanning a configuration setting... Great job, Microsoft! I'd bet the server is misconfigured and/or needs an upgrade. Oh well, you'll see all sorts of weirdness when you work with enough different servers.
- Yes, it makes sense to add global read timeouts, but we can't just set a
read_timeout
, use that on every read, and call it a day like most other simple request/response gems. IMAP connections tend to be long lived, and it's common for some responses to take much longer than others and for some responses (IDLE
) to just sit on a connection, waiting to read data, for thirty minutes or so. I would like to introduce optional command timeouts, but we need to be careful about it. I doubt that it's safe to add a simple timeout toget_tagged_response
. - Until then, adding Timeout is a reasonable approach. I'd personally consider adding a longer timeout for the entire job (e.g. maybe 15 seconds or a few minutes, depending on the job) and a localized timeout for just for authentication. It generally doesn't take more than a couple of seconds to login, but Microsoft Exchange (and Microsoft 365) can behave pretty weirdly sometimes... still, if it takes more than 10 seconds or so, something's wrong.
- For this particular situation though, there is a simpler solution: you can send the
sasl_ir: false
keyword argument toNet::IMAP#authenticate
. SASL-IR is only used when the server explicitly reports the AUTH=mechanism capability, to avoid sending sensitive data that the server can't even use, and that's why we need to ask for the server capabilities. I don't think that's an explicit part of the IMAP SASL-IR spec, but IMO it's implied. And several other protocols do explicitly specify this behavior for their SASL implementation. Usingsasl_ir: false
will slightly slow down authentication to well-behaved servers because 1) they already sent theirCAPABILITY
list as part of the server greeting, and 2) it requires waiting theAUTHENTICATE
command to do an extra round-trip. But, compared to the slowdown caused by this issue, that's not so bad. And you could special case that just for servers you identify as misbehaving. - If milliseconds matter and you "need" the lowest latency, you could cheat and manually set the
@capabilities
ivar for Exchange server clients. That's a total hack though, and will probably bite you (or your successor) someday far in the future, after you've long forgotten why you thought it was a good idea. 😉 The capabilities cache ivar will be reset after a successful authentication (capabilities usually change after authentication).
I'm going to leave this ticket open for the first point (above), at least until we have another timeout-related issue or PR to replace it. I would accept a PR to add read timeouts, although I suspect we may need to iterate on the design before it's safe to merge. Since we'll want to iterate on it anyway, a simple naive version could be a good first draft. It might be something to add but leave turned off for 0.5.x and wait until 0.6.x before we enable read timeouts by default.