gravitational/teleport-plugins

Event Handler / Fluentd Possible client verification issue in fluentd

nicofff opened this issue · 5 comments

Setting up as described here: https://goteleport.com/docs/management/guides/fluentd/

Tried both in k8s with helm, locally with docker and locally running natively (in macOS)

In all cases I get the same error on fluentd:
error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=0 state=error: certificate verify failed (self signed certificate)"

If I verify the certs with openssl, they look fine:

$ openssl verify -verbose -CAfile ca.crt server.crt
server.crt: OK

$ openssl verify -verbose -CAfile ca.crt client.crt
client.crt: OK

If I try s_server and s_client, they verify each other fine

openssl s_server -help -accept 8888 -CAfile ca.crt -cert server.crt -key server.key -Verify 10 -tls1_2 -state
openssl s_client -connect localhost:8888 -CAfile ca.crt -cert client.crt -key client.key -tls1_2 -state -quiet

If I try s_server and start the event handler plugin, I get the event data on the openssl server

If I start fluentd and try s_client, I get the same error.

Fluentd config:

<source>
    @type http
    port 8888

    <transport tls>
        client_cert_auth true
        ca_path "ca.crt"
        cert_path "server.crt"
        private_key_path "server.key"
        private_key_passphrase "xxxxxxxxxxxxxxxxxxx"
    </transport>

    <parse>
      @type json
      json_parser oj

      # This time format is used by Go marshaller
      time_type string
      time_format %Y-%m-%dT%H:%M:%S
    </parse>
</source>

<match test.log>
  @type stdout
</match>

<match session.*.log> 
  @type stdout
</match>

Running locally with:
fluentd -c fluent.conf

If I trigger the bug with s_client, this is what I get:

SSL_connect:before/connect initialization
SSL_connect:SSLv3 write client hello A
SSL_connect:SSLv3 read server hello A
depth=1 C = US, CN = localhost
verify return:1
depth=0 C = US, CN = localhost
verify return:1
SSL_connect:SSLv3 read server certificate A
SSL_connect:SSLv3 read server key exchange A
SSL_connect:SSLv3 read server certificate request A
SSL_connect:SSLv3 read server done A
SSL_connect:SSLv3 write client certificate A
SSL_connect:SSLv3 write client key exchange A
SSL_connect:SSLv3 write certificate verify A
SSL_connect:SSLv3 write change cipher spec A
SSL_connect:SSLv3 write finished A
SSL_connect:SSLv3 flush data
SSL3 alert read:fatal:unknown CA
SSL_connect:failed in SSLv3 read server session ticket A
4331193900:error:14020418:SSL routines:CONNECT_CR_SESSION_TICKET:tlsv1 alert unknown ca:/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/ssl/ssl_pkt.c:1200:SSL alert number 48
4331193900:error:140200E5:SSL routines:CONNECT_CR_SESSION_TICKET:ssl handshake failure:/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/libressl/libressl-2.8/ssl/ssl_pkt.c:585:

It feels like a bug in fluentd not using the provided ca_path to verify the client cert. I tried following through the fluentd code but couldn't find anything obvious.

I'm having the same problem. I think it's related to the comment on this issue taken from the MySQL TLS guide: https://stackoverflow.com/a/19738223/11329621

Whatever method you use to generate the certificate and key files, the Common Name value used for the server and client certificates/keys must each differ from the Common Name value used for the CA certificate. Otherwise, the certificate and key files will not work for servers compiled using OpenSSL.

#640 landed in version 10.2.1 of the event-handler plugin and sets the Issuer field on the CA that's generated by the plugin. Unfortunately this seems to make OpenSSL-based servers unhappy.

I ran ./teleport-event-handler configure . teleport.example.com:443 against both versions 10.1.9 and 10.2.1, moved the generated files into $VERSION/keys and then tested with openssl verify:

ubuntu@ip-172-31-30-140:~/event-handler$ tree 10.1.9/keys
10.1.9/keys
├── ca.crt
├── ca.key
├── client.crt
├── client.key
├── server.crt
└── server.key

0 directories, 6 files
ubuntu@ip-172-31-30-140:~/event-handler$ tree 10.2.1/keys
10.2.1/keys
├── ca.crt
├── ca.key
├── client.crt
├── client.key
├── server.crt
└── server.key

0 directories, 6 files

ubuntu@ip-172-31-30-140:~/event-handler$ for VERSION in 10.1.9 10.2.1; do echo $VERSION; for TYPE in client server; do echo $TYPE; openssl verify -verbose -CAfile $VERSION/keys/ca.crt $VERSION/keys/$TYPE.crt; done; done
10.1.9
client
10.1.9/keys/client.crt: OK
server
10.1.9/keys/server.crt: OK
10.2.1
client
C = US, CN = localhost
error 18 at 0 depth lookup: self-signed certificate
error 10.2.1/keys/client.crt: verification failed
server
C = US, CN = localhost
error 18 at 0 depth lookup: self-signed certificate
error 10.2.1/keys/server.crt: verification failed

10.1.9 works, 10.2.1 fails.

If I run fluentd with the 10.1.9 certs:

event-handler logs:

INFO   Using batch size batch:20 event-handler/cli.go:236
INFO   Using namespace namespace:default event-handler/cli.go:237
INFO   Using type filter types:[] event-handler/cli.go:238
INFO   Skipping session events of type types:map[print:{}] event-handler/cli.go:239
INFO   Using start time value:<nil> event-handler/cli.go:240
INFO   Using timeout timeout:10s event-handler/cli.go:241
INFO   Using Fluentd url url:https://localhost:8888/test.log event-handler/cli.go:242
INFO   Using Fluentd session url url:https://localhost:8888/session event-handler/cli.go:243
INFO   Using Fluentd ca ca:/home/ubuntu/event-handler/10.1.9/keys/ca.crt event-handler/cli.go:244
INFO   Using Fluentd cert cert:/home/ubuntu/event-handler/10.1.9/keys/client.crt event-handler/cli.go:245
INFO   Using Fluentd key key:/home/ubuntu/event-handler/10.1.9/keys/client.key event-handler/cli.go:246
INFO   Using Teleport identity file file:/home/ubuntu/event-handler/identity event-handler/cli.go:249
INFO   Using existing storage directory dir:storage/example.teleportdemo.com_443 event-handler/state.go:114
INFO   Using initial cursor value cursor:4417e50b-439d-4c6d-be82-d02d9ba7dde9 event-handler/app.go:191
INFO   Using initial ID value id:4894832b-6166-479b-8dcb-5bb563c74881 event-handler/app.go:192
INFO   Using start time from state value:2022-09-12 14:02:43 +0000 UTC event-handler/app.go:193
<logs flow>
^CINFO   Attempting graceful shutdown... lib/signals.go:32
INFO   Successfully shut down event-handler/main.go:80

fluentd logs:

2023-04-05 20:44:42 +0000 [info]: starting fluentd-1.15.3 pid=6 ruby="3.1.3"
2023-04-05 20:44:42 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2023-04-05 20:44:42 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-04-05 20:44:43 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-04-05 20:44:43 +0000 [info]: adding match pattern="test.log" type="stdout"
2023-04-05 20:44:43 +0000 [info]: adding match pattern="session.*.log" type="stdout"
2023-04-05 20:44:43 +0000 [info]: adding source type="http"
2023-04-05 20:44:43 +0000 [info]: #0 starting fluentd worker pid=15 ppid=6 worker=0
2023-04-05 20:44:43 +0000 [info]: #0 fluentd worker is now running worker=0
2022-11-09 20:22:16.000000000 +0000 test.log: {"ei":0,"event":"user.update","uid":"f5af5ce1-88b4-44da-89a9-e8a23b30f608","code":"T1003I","cluster_name":"purple","user":"bot-gusbot","name":"bot-gusbot","expires":"0001-01-01T00:00:00Z","roles":["bot-gusbot"],"connector":"local"}
<more events flow>

but using the 10.2.1 certs:

event-handler logs:

INFO   Using batch size batch:20 event-handler/cli.go:236
INFO   Using namespace namespace:default event-handler/cli.go:237
INFO   Using type filter types:[] event-handler/cli.go:238
INFO   Skipping session events of type types:map[print:{}] event-handler/cli.go:239
INFO   Using start time value:<nil> event-handler/cli.go:240
INFO   Using timeout timeout:10s event-handler/cli.go:241
INFO   Using Fluentd url url:https://localhost:8888/test.log event-handler/cli.go:242
INFO   Using Fluentd session url url:https://localhost:8888/session event-handler/cli.go:243
INFO   Using Fluentd ca ca:/home/ubuntu/event-handler/10.2.1/keys/ca.crt event-handler/cli.go:244
INFO   Using Fluentd cert cert:/home/ubuntu/event-handler/10.2.1/keys/client.crt event-handler/cli.go:245
INFO   Using Fluentd key key:/home/ubuntu/event-handler/10.2.1/keys/client.key event-handler/cli.go:246
INFO   Using Teleport identity file file:/home/ubuntu/event-handler/identity event-handler/cli.go:249
INFO   Using existing storage directory dir:storage/example.teleportdemo.com_443 event-handler/state.go:114
INFO   Using initial cursor value cursor:4894832b-6166-479b-8dcb-5bb563c74881 event-handler/app.go:191
INFO   Using initial ID value id:c241e53e-f21e-4f95-89fe-57d25e0d2125 event-handler/app.go:192
INFO   Using start time from state value:2022-09-12 14:02:43 +0000 UTC event-handler/app.go:193
ERRO   Error sending event to Teleport: Post "https://localhost:8888/test.log": remote error: tls: unknown certificate authority event-handler/app.go:118

fluentd logs:

2023-04-05 20:45:20 +0000 [info]: starting fluentd-1.15.3 pid=7 ruby="3.1.3"
2023-04-05 20:45:20 +0000 [info]: spawn command to main:  cmdline=["/usr/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "--plugin", "/fluentd/plugins", "--under-supervisor"]
2023-04-05 20:45:20 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-04-05 20:45:20 +0000 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2023-04-05 20:45:20 +0000 [info]: adding match pattern="test.log" type="stdout"
2023-04-05 20:45:20 +0000 [info]: adding match pattern="session.*.log" type="stdout"
2023-04-05 20:45:20 +0000 [info]: adding source type="http"
2023-04-05 20:45:20 +0000 [info]: #0 starting fluentd worker pid=16 ppid=7 worker=0
2023-04-05 20:45:20 +0000 [info]: #0 fluentd worker is now running worker=0
2023-04-05 20:45:46 +0000 [warn]: #0 unexpected error before accepting TLS connection by OpenSSL addr="172.17.0.1" host="172.17.0.1" port=41188 error_class=OpenSSL::SSL::SSLError error="SSL_accept returned=1 errno=0 peeraddr=172.17.0.1:41188 state=error: certificate verify failed (self signed certificate)"
jof commented

I am also hitting this same issue. When using the latest version (v13.3.7 in my testing), the event-handler plugin's configure command generates mTLS certificates with the same Distinguished Name for all of the CA, Server, and Client certificates.
Without any other identifiers (like authorityKeyIdentifier or something) to identity the issuing CA, X.509 verifiers are looking at the Issuer field DN, seeing it matches the Subject, and treating the certificate as a self-signed certificate.

For example, when passing it to openssl verify:

$ openssl verify -show_chain -verbose -CAfile ca.crt client.crt
C = US, CN = localhost
error 18 at 0 depth lookup: self-signed certificate
error client.crt: verification failed

I was able to work around this by generating my own certificates with different Distinguished Names and using subjectAlternativeName field values to list the network identities of the client and server (in the single-node demo context, I'm listing DNS:localhost, IP:127.0.0.1, and IP:::1)

Looking into the mTLS certificate generation, the issuer and subject fields for all certificates is coming from a single entity variable, which is making these certs appear as self-signed to OpenSSL.

Additionally, subjectAltNames are only added to the Server certificate, but not the Client certificate.

I would propose that we change up the event-handler plugin to generate certificates with varying DNs and meaningful SANs on both the Client and Server certs. If we do this, we can also drop the cn argument to GenerateMTLSCerts, as it wont really do anything.

jof commented

The breakage was quite clear, and with these last two PRs that have landed this seems fixed now.

Thanks for the fix!