Flaky test with macOS platform
Opened this issue · 3 comments
I sometimes got the following failure with ruby/ruby
and make test-bundled-gems
.
Testing the net-imap gem
/Users/hsbt/Documents/github.com/ruby/ruby/exe/ruby -C /Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap /Users/hsbt/Documents/github.com/ruby/ruby/.bundle/bin/rake test
Loaded suite /Users/hsbt/Documents/github.com/ruby/ruby/.bundle/gems/rake-13.2.1/lib/rake/rake_test_loader
Started
/Users/hsbt/Documents/github.com/ruby/ruby/.bundle/gems/test-unit-3.6.2/lib/test/unit/fixture.rb:284: warning: the block passed to 'priority_setup' defined at /Users/hsbt/Documents/github.com/ruby/ruby/.bundle/gems/test-unit-3.6.2/lib/test/unit/priority.rb:183 may be ignored
F
====================================================================================================================================================================================
Failure: test_imaps_with_ca_file(IMAPTest):
Exception raised:
<#<OpenSSL::SSL::SSLError: SSL_read: tlsv1 alert unknown ca (SSL alert number 48)>>
Backtrace:
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:80:in 'OpenSSL::SSL::SSLSocket#sysread'
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:80:in 'OpenSSL::Buffering#fill_rbuff'
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:236:in 'OpenSSL::Buffering#gets'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2720:in 'Net::IMAP#get_response'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2629:in 'Net::IMAP#receive_responses'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2606:in 'block in Net::IMAP#start_receiver_thread'.
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:80:in 'OpenSSL::SSL::SSLSocket#sysread'
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:80:in 'OpenSSL::Buffering#fill_rbuff'
/Users/hsbt/Documents/github.com/ruby/ruby/.ext/common/openssl/buffering.rb:236:in 'OpenSSL::Buffering#gets'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2720:in 'Net::IMAP#get_response'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2629:in 'Net::IMAP#receive_responses'
/Users/hsbt/Documents/github.com/ruby/ruby/gems/src/net-imap/lib/net/imap.rb:2606:in 'block in Net::IMAP#start_receiver_thread'
====================================================================================================================================================================================
Finished in 8.730846 seconds.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
818 tests, 8689 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
99.8778% passed
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
93.69 tests/s, 995.21 assertions/s
rake aborted!
Command failed with status (1)
/Users/hsbt/Documents/github.com/ruby/ruby/.bundle/gems/rake-13.2.1/exe/rake:27:in '<top (required)>'
Tasks: TOP => test
(See full trace by running task with --trace)
Tests failed with exit code 1
My environment is macOS 14.5 with Apple Silicon and using OpenSSL 3 library provided from homebrew.
@hsbt That's strange... is it only that one test? How easy is it to reproduce? Is it failing ~5% of the time or 95% or? I haven't noticed it failing in the CI workflow. I'm only running on Linux, and haven't seen it myself. I wonder if it's unique to Apple Silicon or a specific version of OpenSSL or something else that's different from macos-latest
in GH actions.
I can see one thing that's unique about that test. For all of the tests that create an SSL connection:
- 2 use
starttls_test
- 4 use
imaps_test
- All of the rest have been converted to (or were originally written with)
run_fake_server_in_thread
(orwith_fake_server
, which delegates torun_fake_server_in_thread
)
This test is one of the four that uses imaps_test
. And of those four, this is one of only two that expects a successful connection. And this is the only one that verifies the server's cert. So, if this is the only test that's flaky, the problem is probably in imaps_test
.
At first, I couldn't see anything significant between how the server is configured for imaps_test
vs any of the others. We're using the exact same SSL fixture files (ca_path
, key
, and cert
) in all three places, and we seem to be creating the SSLContext identically. But, after looking more closely, I see that imaps_test
is using OpenSSL::SSL::SSLServer.new(tcp_server, ssl_ctx)
and the other two are using OpenSSL::SSL::SSLSocket.new(tcp_socket, ssl_ctx)
.
Looking at SSLServer
, it seems like there's no difference between what it's doing and what FakeServer is doing with SSLSocket
. The only difference I can discern is that SSLServer#initialize
contains the following:
unless ctx.session_id_context
# see #6137 - session id may not exceed 32 bytes
prng = ::Random.new($0.hash)
session_id = prng.bytes(16).unpack1('H*')
@ctx.session_id_context = session_id
end
I have no idea why session_id_context
would matter for this (it probably doesn't), but it's the only difference I could see.
We could experiment with simply converting imaps_test
to use SSLSocket
directly, or even convert those four tests over to run_fake_server_in_thread
. But I'm still baffled and curious about why this would make a difference.
Should we ask about this over at https://github.com/ruby/openssl/issues?
I ran make test-bundled-gems
on ruby/ruby ~10 times with no luck. I'm also running on Linux.
<#<OpenSSL::SSL::SSLError: SSL_read: tlsv1 alert unknown ca (SSL alert number 48)>>
This OpenSSL::SSL::SSLError
on the client side seems weird. The error message says that it received the "unknown_ca" alert. It is usually sent when the server doesn't trust the client certificate, but I don't see any test cases in net-imap that use a client certificate.
session_id_context
I believe it's irrelevant. The seesion ID context is only used for TLS session resumption, which net-imap doesn't use.
That's strange... is it only that one test? How easy is it to reproduce? Is it failing ~5% of the time or 95% or?
I see this fail once every day.
I haven't noticed it failing in the CI workflow.
Yes, I also didn't see this fail our GitHub Actions. It may be caused by my development environment.