[Bug]: query: TLS handshake error
Closed this issue · 5 comments
What happened?
Using v1.63.0 on Windows.
When loading https://localhost:16686/
in the browser
I get the error on the console where I run jaeger-all-in-one.exe
http: TLS handshake error from [::1]:55388: tls: first record does not look like a TLS handshake
I run with args:
--query.additional-headers="Access-Control-Allow-Origin: " --collector.otlp.enabled=true --collector.otlp.http.cors.allowed-origins="" --collector.otlp.http.cors.allowed-headers="*" --collector.otlp.http.host-port=4318 --collector.otlp.http.tls.key=%TLS%\key.pem --collector.otlp.http.tls.cert=%TLS%\cert.pem --collector.otlp.http.tls.enabled=true --query.http.tls.enabled=true --query.http.tls.key=%TLS%\key.pem --query.http.tls.cert=%TLS%\cert.pem
This works with v1.62.0.
Maybe this is related to PR #6023
I don't see an update to the CLI doc.
Steps to reproduce
- Run jaeger-all-in-one.exe on windows with args similar to above
- Open UI at default port in browser
- See described error in jaeger console
Expected behavior
It should work as before - the UI should load but it remains empty.
Relevant log output
http: TLS handshake error from [::1]:55388: tls: first record does not look like a TLS handshake
Screenshot
No response
Additional context
No response
Jaeger backend version
v1.63.0
SDK
No response
Pipeline
No response
Stogage backend
none (all-in-one)
Operating system
Windows
Deployment model
No response
Deployment configs
@mahadzaryab1 can you take a look?
@mahadzaryab1 I am able to reproduce, so it looks like it might be related to introducing OTEL helpers:
go run ./cmd/all-in-one \
--query.http.tls.enabled=true \
--query.http.tls.cert=./pkg/config/tlscfg/testdata/example-server-cert.pem \
--query.http.tls.key=./pkg/config/tlscfg/testdata/example-server-key.pem
after the request the server logs a bunch of errors:
2024/11/21 23:27:00 http: TLS handshake error from [::1]:62373: client sent an HTTP request to an HTTPS server
2024/11/21 23:27:04 http: TLS handshake error from [::1]:62374: EOF
2024/11/21 23:27:05 http: TLS handshake error from [::1]:62375: remote error: tls: unknown certificate
2024/11/21 23:27:05 http: TLS handshake error from [::1]:62376: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62377: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62378: remote error: tls: unknown certificate
2024/11/21 23:27:13 http: TLS handshake error from [::1]:62379: tls: first record does not look like a TLS handshake
$ curl -v -k https://localhost:16686/
...
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://localhost:16686/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: localhost:16686]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: localhost:16686
> User-Agent: curl/8.7.1
> Accept: */*
>
* Request completely sent off
* Closing connection
* Recv failure: Connection reset by peer
* Send failure: Broken pipe
curl: (16) Recv failure: Connection reset by peer
@mahadzaryab1 I think this is the mistake:
jaeger/cmd/query/app/server.go
Lines 331 to 335 in adbdf26
The listener is already upgraded to TLS by ToListener() function. If I replace it with only s.httpServer.Serve(s.httpConn)
, then the connection works, but ONLY with http/1.1:
$ curl -k --http1.1 https://localhost:16686/ > /dev/null
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4454 0 4454 0 0 893k 0 --:--:-- --:--:-- --:--:-- 1087k
So there's still something funky going on, as by default curl tries to use http/2 and the server doesn't like it. Meanwhile, OTEL's test do appear to be testing both 2 and 1 https://github.com/open-telemetry/opentelemetry-collector/blob/00ad49af88794a8ad03d9994d290d1dea410ede2/config/confighttp/confighttp_test.go#L681
@yurishkuro Thanks for the reproduction steps. I've got a patch at #6239. I'm still unsure as to why the unit tests weren't able to catch this and will see if I can come up with a regression test for this.
Let's keep this open for a bit until we find out why the tests did not catch this.
One suspicious thing I see is here
jaeger/cmd/query/app/server_test.go
Line 452 in edb896e
The HTTP/TLS tests are only run when ClientCA is present. If I remove that 2nd clause, the tests still pass - but both before and after your fix, so they are still not catching the issue. But maybe we have other similar filters.
It's also not clear to me why we are doing manual dial outs in this block
jaeger/cmd/query/app/server_test.go
Line 422 in edb896e