Incorrect Handling of SNI: No Certificate Available (Regression)
elcore opened this issue · 17 comments
After updating the Caddy Layer4 module to e491c44, Caddy erroneously attempts to handle certificates for domains when it should instead proxy and forward connections based on SNI.
Logs:
{"level":"error","ts":1723754927.6938345,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35770","error":"no certificate available for 'ud-chat.signal.org'"}
{"level":"error","ts":1723754927.7430856,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35772","error":"no certificate available for 'chat.signal.org'"}
{"level":"error","ts":1723754927.9744139,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35788","error":"no certificate available for 'ud-chat.signal.org'"}
{"level":"error","ts":1723754928.1224709,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35802","error":"no certificate available for 'chat.signal.org'"}
{"level":"error","ts":1723754929.6092706,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:60324","error":"no certificate available for 'ud-chat.signal.org'"}
...
Configuration:
{
"apps": {
"tls": {
"certificates": {
"automate": [
"sub.example.com"
]
}
},
"layer4": {
"servers": {
"signal_proxy": {
"listen": [
":443"
],
"routes": [
{
"handle": [
{
"handler": "tls"
}
]
},
{
"match": [
{
"tls": {
"sni": [
"chat.signal.org",
"ud-chat.signal.org",
"storage.signal.org",
"cdn.signal.org",
"cdn2.signal.org",
"cdn3.signal.org",
"api.directory.signal.org",
"cdsi.signal.org",
"sfu.voip.signal.org",
"svr2.signal.org",
"updates.signal.org",
"updates2.signal.org",
"backend1.svr3.signal.org",
"backend2.svr3.signal.org",
"backend3.svr3.signal.org"
]
}
}
],
"handle": [
{
"handler": "proxy",
"upstreams": [
{
"dial": [
"{l4.tls.server_name}:443"
]
}
]
}
]
}
]
}
}
}
}
}
- Reverting to 94cd399 fixes the issue
The upstream address is being replaced aggressively too early here with a fresh replacer (not from context) that doesn't have the matched SNI, causing it to be empty because it's ReplaceAll
. It used to being replaced at the dialPeer
call which happens inside the Handle
method of the proxy handler.
@vnxme, any chance you can take a look?
@elcore, I have come up with a workaround. but it requires duplication. You can make a match/proxy-handle block for each upstream address.
Hmm, interesting 🤔. I think for now, pinning to the working commit is temporarily fine (after all it’s inside a Docker container), unless you’d say that not updating would be not wise.
Keep the pin. I like that you have a valid use case and a simple reproducer 🙂
@mohammed90 I'll look into the issue and propose a solution. @elcore, thank you for a valid use case example.
@elcore First of all, you have no certificate available for 'ud-chat.signal.org'
-like errors because you have a route with a single tls
handler before everything else in your list of routes. I think you don't need this route at all if you would like to proxy https (and assuming you don't have valid certificates for these domain names). Alternatively, it should be placed after the route where you match the third-party domains and proxy them.
with a fresh replacer (not from context) that doesn't have the matched SNI
@mohammed90 We've already discussed that, and I mentioned that the context is empty at provision. That's why there is no point in trying to extract the replacer from the context - it won't work, I've checked.
The upstream address is being replaced aggressively too early here
causing it to be empty because it's
ReplaceAll
.
But you are right stating that the upstream address gets empty in this issue because {l4.*} placeholders don't exist at provision. What do you think if we use ReplaceKnown there instead of ReplaceAll?
It’s a layered approach with TLS in TLS hence the above configuration. I currently cannot test your idea.
a layered approach with TLS in TLS
This way you have to use subroute
. Like this:
{
layer4 {
:443 {
route {
tls
subroute {
@cf tls sni one.one.one.one
route @cf {
proxy 1.1.1.1:443
}
}
}
}
}
}
I will certainly try this ASAP. Nonetheless I am surprised that the previous config is now broken with the latest commit(s). Sidenote: This seems like a Caddyfile configuration, right?
What do you think if we use ReplaceKnown there instead of ReplaceAll?
This would fix it.
This way you have to use subroute. Like this:
This approach is the workaround I suggested earlier, which requires duplication. Elcore's existing config takes advantage of the matched value being known at runtime without having to make a separate route/branch for every domain name.
Nonetheless I am surprised that the previous config is now broken with the latest commit(s).
I'm surprised your config actually used to work because it's conceptually wrong (missing subroute, as mentioned above). But you are right that {l4.tls.server_name}:443
should be an acceptable dial address, and we will fix that shortly.
This seems like a Caddyfile configuration, right?
Exactly. It's a part of the latest commits as well.
Elcore's existing config takes advantage of the matched value being known at runtime without having to make a separate route/branch for every domain name.
Yes, I can see that. I suppose you haven't got my idea. The final working (after a fix, and given Elcore needs TLS-in-TLS) config will be as follows:
{
layer4 {
:443 {
route {
tls
subroute {
@signal tls sni chat.signal.org ud-chat.signal.org storage.signal.org cdn.signal.org cdn2.signal.org cdn3.signal.org api.directory.signal.org cdsi.signal.org sfu.voip.signal.org svr2.signal.org updates.signal.org updates2.signal.org backend1.svr3.signal.org backend2.svr3.signal.org backend3.svr3.signal.org
route @signal {
proxy {l4.tls.server_name}:443
}
}
}
}
}
}
I suppose you haven't got my idea
I get your idea, and saying the route
is unnecessary.
@elcore Can you try xcaddy build --with github.com/mholt/caddy-l4=github.com/WeidiDeng/caddy-l4@prefetch-fix
?
@WeidiDeng Your fix does not seem to work - Same issue as above
@vnxme Your fix does work.