elastic/logstash

Logstash 1.5.3 SSL problems

jordansissel opened this issue · 29 comments

Something is funky with Logstash 1.5.3's SSL. I'm not sure what, yet.

Tests -

With a self-signed CN=localhost certificate on java 1.7.0_79

  • ✅ LSF 0.4.0 to LS 1.5.3
  • ❌ LS 1.5.3 to LS 1.5.3 certificate verify failure
  • ❌ LS 1.5.3 to LS 1.5.2 certificate verify failure
  • ❌ LS 1.5.2 to LS 1.5.2 server key exchange invalid
  • ❌ LS 1.4.3 to LS 1.5.2 Socket closed
  • ❌ openssl s_client to LS 1.5.2: RSA_EAY_PUBLIC_DECRYPT:data too large for modulus / SSL routines:ssl3_get_key_exchange:bad signature

Related tickets

Getting all kinds of weird failures even with Logstash 1.5.2...

ph commented

Do you have the same issue with Java 1.8?

@ph it's on my todo list to try.

Same problem on Java 1.8.0_60-internal

@ph can you help me test these scenarios (LSF 0.4.0 to LS 1.5.3, LS 1.5.2 to LS 1.5.3, etc)? I am still trying to figure out what's wrong with my workstation.

I also tested in an Ubuntu 14.04 docker container. Same errors. This is bizarre. Will try on a different system tomorrow.

Here's my setup: java 1.8.0_45, LSF 0.4.0, ubuntu 14.04 server 64bit.
When i run LS 1.5.3 in debug mode i see this warnings:

Starting lumberjack input listener {:address=>"172.xx.xx.5:6xx2", :level=>:info, :file=>"logstash/inputs/lumberjack.rb", :line=>"50", :method=>"register"}
[2015-07-28 09:52:15.967] WARN -- Concurrent::Condition: [DEPRECATED] Will be replaced with Synchronization::Object in v1.0.
called on: /opt/logs/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.2/lib/logstash/sized_queue_timeout.rb:15:in initialize' [2015-07-28 09:52:15.968] WARN -- Concurrent::Condition: [DEPRECATED] Will be replaced with Synchronization::Object in v1.0. called on: /opt/logs/logstash-1.5.3/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.2/lib/logstash/sized_queue_timeout.rb:16:ininitialize'

Later, nothing appears after "Logstash startup completed", but LSF keeps firing:

2015/07/28 09:52:16.243990 Connecting to [172.xx.xx.5]:6782 (172.xx.xx.5)
2015/07/28 09:52:16.694600 Failed to tls handshake with 172.xx.xx.5 EOF

With LS 1.5.2 no errors and logs are being processed.

p.s. I use SAN IP addresses in certificate, not CN=somename.

Here's configuration i use in both tests:
LSF setup:

{
"network": {
"servers": [ "172.xx.xx.5:6xx2" ],
"ssl certificate": "/etc/logstash-forwarder/ssl/ls.pem",
"ssl key": "/etc/logstash-forwarder/ssl/ls.pem",
"ssl ca": "/etc/logstash-forwarder/ssl/ca.crt"
},
"files": [
{
"paths": [ "/var/log/auth.log" ],
"fields": { "type": "syslog" }
}
]
}

LS config:
input {
lumberjack {
port => 6xx2
host => "172.xx.5"
ssl_certificate => "/etc/logstash/ssl/lsf.crt"
ssl_key => "/etc/logstash/ssl/lsf.key"
}
}

ph commented

MacOS X
java version "1.8.0_20"
Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

FROM TO WORKING
LSF 0.4.0 LS 1.5.0 YES
LSF 0.4.0 LS 1.5.2 YES
LSF 0.4.0 LS 1.5.3 NO
LS 1.5.2 LS 1.5.2 YES
LS 1.5.2 LS 1.5.3 YES
LS 1.5.3 LS 1.5.3 NO
LS 1.5.3 LS 1.5.2 NO
ph commented

We can rule out the jruby version both 1.5.2 and 1.5.3 are shipped with the same version.

/t/t/logstash-1.5.3 ❯❯❯ ./vendor/jruby/bin/jruby -v ⏎
jruby 1.7.20 (1.9.3p551) 2015-05-04 3086e6a on Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 +jit [darwin-x86_64]
                         ⏎
/t/test-ssl ❯❯❯ cd logstash-1.5.2
/t/t/logstash-1.5.2 ❯❯❯ ./vendor/jruby/bin/jruby -v
jruby 1.7.20 (1.9.3p551) 2015-05-04 3086e6a on Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26 +jit [darwin-x86_64]
ph commented

Linux vagrant-ubuntu-trusty-32 3.13.0-55-generic #92-Ubuntu SMP Sun Jun 14 18:33:09 UTC 2015 i686 i686 i686 GNU/Linux
vagrant@vagrant-ubuntu-trusty-32:/tmp/packages$ java -version
java version "1.7.0_79"
OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)
OpenJDK Client VM (build 24.79-b02, mixed mode, sharing)

FROM TO WORKING errors
LSF 0.4.0 LS 1.5.0 OK
LSF 0.4.0 LS 1.5.2 OK
LSF 0.4.0 LS 1.5.3 NO
LS 1.5.2 LS 1.5.2 OK
LS 1.5.2 LS 1.5.3 OK
LS 1.5.3 LS 1.5.3 NO OpenSSL::SSL::SSLError: certificate verify failed
LS 1.5.3 LS 1.5.2 NO OpenSSL::SSL::SSLError: certificate verify failed

@ph those failures are ones I expected given all the user reports. whew

ph commented

@jordansissel also consistent between jvm.

ph commented

I have downgraded concurrent-ruby to 0.8.0 from 0.9.0 and it doesn't fix this issue.

ph commented

Disabling the monkey patch from #3579 make LSF 0.4.0 correctly sends events to 1.5.3, I'll look into it.

@ph and I paired up on Zoom to discuss this.

We narrowed it down to the verify_peer setting change. Commenting out our new verify_mode default seems to solve the problem and aligns with user reports (ssl servers rejecting due to verify failures)

Ruby 1.9.3 long ago changed their "default" settings for openssl in the set_params method, but since nothing actually invokes set_params the change is basically useless. Further, because our monkeypatch actually invokes set_params with this new verify_mode = VERIFY_PEER setting, it is our belief that this causes servers to attempt to verify client certificates by default. This new default causes servers to reject all clients, basically, and that aligns with the behavior folks are seeing with lumberjack and tcp input.

So the fix will be to not force verify_mode. The default value is nil for this setting, and the default behavior seems to be for clients to be VERIFY_PEER and for servers to be VERIFY_NONE - though I haven't been able to confirm the setting itself because it remains nil throughout a context's lifetime.

We'll be releasing v1.5.4 as soon as we can after testing and confirming this fix.

ph commented

I still need to do more testing on my end, will have the PR tomorrow,
We will have to release 1.4.5 with the changes.

May I know roughly when v1.5.4 will be out?

@foresightyj I don't have an estimate. If I gave one, it may be wrong, and it would set your expectations in a way that would cause you upset :)

If there's a bug fixed in 1.5.4 and it's already fixed, you can build from the 1.5 branch rake artifact:tar and deploy that. If there's a bugfix you're waiting on, it will land when we are finished fixing them - I don't have much time estimate on it. Hope this helps!

Thanks for the candid answer. I didn't have much experience with ruby especially the tooling around it. I can wait.

@timukas I had a similar config to yours and was seeing the TLS handshake error. I was able to resolve the problem by removing ssl certificate and ssl key from the LSF config. You'd be left with:

{
    "network": {
        "servers": [ "172.xx.xx.5:6xx2" ],
        "ssl ca": "/etc/logstash-forwarder/ssl/ca.crt"
    },
    "files": [
        {
            "paths": [ "/var/log/auth.log" ],
            "fields": { "type": "syslog" }
        }
    ]
}
TinLe commented

Confirmed that using only "ssl ca" works. I just updated to latest released LSF and LS v1.5.3 and seeing handshake errors.

I switched to log-courier and still see same errors. Removing "ssl certificate" and "ssl key" works.

What TinLe said above was a great help to me. All of my logstash-fowarders had stopped talking to my logstash-1.5.3 until I removed those configuration settings.

FWIW, when diagnosing this problem, I discovered that 'openssl s_client -connect 10.x.x.x:5043 -tls1' worked (i.e. specifying tls1 resulted in a successful connection).

Removing "ssl certification" and "ssl key" on the forwarder config also solved my issue. Thanks!

What TinLe said above was a great help to me. All of my logstash-fowarders had stopped talking to my logstash-1.5.3 until I removed those configuration settings.

Apologies for this. 1.5.3 had an issue where SSL servers within Logstash rejected connections from clients when they provided certificates. We fixed this in 1.5.4. Upgrading to LS 1.5.4 will fix this issue.

I can also confirm that updating to 1.5.4 also solved the issue without a needed workaround. Thanks!

i could still see "Read error looking for ack: EOF" ERROR on logstash1.5.4 and logstash-forwarder0.4.0
Both Logstash and forwarder in ubuntu 14.04.1
Java Version: "1.8.0_60"

Here is my logstash-forwarder config
{
"network": {
"servers": [ "x.x.x.x:5000" ],
"ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt",
"timeout": 60
},

"files": [
{
"paths": [ "/var/log/auth.log"],
"fields": { "type": "syslog" }
}
]
}

ERROR:
Setting trusted CA from file: /etc/pki/tls/certs/logstash-forwarder.crt
2015/09/11 15:48:32.426447 Connecting to [x.x.x.x]:5000 (x.x.x.x)
2015/09/11 15:49:02.580001 Connected to x.x.x.x
2015/09/11 15:49:12.553756 Read error looking for ack: EOF
2015/09/11 15:49:12.553875 Setting trusted CA from file: /etc/pki/tls/certs/logstash-forwarder.crt
2015/09/11 15:49:12.554232 Connecting to [x.x.x.x]:5000 (x.x.x.x)
2015/09/11 15:49:42.711990 Connected to x.x.x.x
2015/09/11 15:49:52.685686 Read error looking for ack: EOF
2015/09/11 15:49:52.685787 Setting trusted CA from file: /etc/pki/tls/certs/logstash-forwarder.crt
2015/09/11 15:49:52.686101 Connecting to [x.x.x.x]:5000 (x.x.x.x)

same error here