matrix-org/synapse

Send SNI indication to support vhosts over federation (SYN-620)

matrixbot opened this issue · 17 comments

TwoOne problems:

  • Federation doesn't send an SNI indication (because twisted), so for vhosted servers we tend to end up on the default.
  • We send the server_name in the Host header, rather than what the SRV tells us to use. (We think the current behaviour is correct, as per #2525).

(Imported from https://matrix.org/jira/browse/SYN-620)

Jira watchers: @ara4n @richvdh

Links exported from Jira:

is duplicated by SYN-233

doublemalt (re)submitted a PR to try to get twisted to implement SNI in the http client: twisted/twisted#281.

-- @richvdh

Looks like this is worth another look; the twisted PR has been closed with a link to an API

4b69 commented

Install docs should probably be made more clear to account for this issue:

For example, you might want to run your server at synapse.example.com, but have your Matrix user-ids look like @user:example.com

Is not possible with reverse-proxying. The readme states:

Synapse does not currently support SNI on the federation protocol (bug #1491), which means that using name-based virtual hosting is unreliable.

But in actuality it's not unreliable, it's impossible.

But in actuality it's not unreliable, it's impossible.

It's possible if the reverse-proxy is configured to forward to synapse by 'default', when there is no SNI header.

Is this SNI bug still open after two years?

For work conducted by the core team it comes down to a question of priority - right now that means dealing with the massive growth on matrix.org, hence the bias towards performance in the common case.

With that in mind, community contributions much appreciated :)

AFAIKT synapse now at least sends the SNI headers.
At least my reverse proxy shows my mxdomain for the federation requests that are coming in.

@krombel: I don't think so. I can't see any SNI headers on SSL traffic arriving on my server.

re:

We think the current behaviour is correct, as per #2525

I'm aware this conversation may be long passed, but I just wanted to throw my 2¢ out there:

Because SRV never got mainstream traction I would expect SRV resolution to take place in user-space and then any client to make a standard request to https://SRV_HOST:SRV_PORT and validate that they have an SSL certificate for SRV_HOST. Doing it as is currently implemented has some interesting implications:

  1. Compromise of matrix.example.com results in a valid SSL cert for example.com
  2. example.com is not a name that the server is intended to be reachable at through normal means (it's unexpected/surprising to have the vhost listen on that, avoiding surprises is good)
  3. It is hard to host a matrix.example.com on the same machine as example.com due to having to configure path or other odd routing methods since that server is expected to respond to matrix requests on example.com and serve normal web requests for example.com. (This could be valid when using SRV records for some sort of HA/failover)

The one positive I see with the current implementation is that it allows for SRV records to point to IP addresses, and works around any issues about what certs to validate there.

Hopefully this is finally fixed as of 0.33.3, thanks to @vojeroen.

euank commented

Unfortunately, until the ecosystem of federated servers have all upgraded, SNI still can't be relied on since older servers won't send it.

Putting a homeserver behind SNI right now will mean you can only federate with a subset of up to date servers.

Unfortunately, there's also not a good summary of the "versions" present on the broader matrix network, so it's difficult for server operators to know when they can rely on SNI.

This relates to issue matrix-org/matrix.org#67 to some degree.

As a start, a server operator should check /_matrix/federation/v1/version for all the servers they already federate with to make sure flipping on an SNI load balancer or such wont' break existing rooms / chats.

However, it still breaks communication with other servers that aren't new enough, and there's really not a great way for a server operator to make an informed decision about when there are few enough active servers in the broader matrix fediverse of older versions that they're okay breaking them.

As far as I know, good tooling for doing the above isn't available, so I'll be writing some once-off postgres queries and scripts to do it for my server, but it's not really reasonable to expect every server operator to do that before using an LB which requires SNI.

@euank That is the case with every new feature. This issue just mentioned sending SNI support - not requiring it on the receiving side. That will stay the case for some time.
And just to note: It is possible to use the user-agent header to identify which server versions are connecting your server. As you are interested in the incoming connections that might be a better place to check if the federating servers would be able to connect when you start requiring SNI support

euank commented

That is the case with every new feature

@krombel this is different than other features. Most of the time, if my server has a new feature X, old servers will simply not use it, but I can still receive messages from users on those servers.

SNI is special in that if I require SNI, there is no degradation; all old servers will simply get a certificate error and I won't be able to communicate with them at all. To my knowledge, there hasn't been any other such feature that had such a negative impact to use.

This issue just mentioned sending SNI support - not requiring it on the receiving side

They're two sides of the same coin. People want it to be sent so they can host synapse as they do other http endpoints: behind a service proxy, load balancer, ingress controller, whatever. The issue title even mentions "support vhosts" which is another way to say "require SNI on the receiving side".

Perhaps we should create a new issue for the receiving side, which would basically be figuring out when to update the readme (here), that is to say figuring out what criterion we need to meet in the broader matrix network before we can recommend running synapse behind some SNI-aware LB.

And just to note: It is possible to use the user-agent header to identify which server versions are connecting your server. As you are interested in the incoming connections that might be a better place to check if the federating servers would be able to connect when you start requiring SNI support

I think it's worse. User-agents can be re-written (e.g. if it goes through certain proxies or other things) and are a less specified behaviour in server-server communication. The server-server api at least documents the api I referenced.

euank commented

I decided to check the adaption of this feature from my view of the network to determine if I could finally put synapse behind a regular load balancer like all the other http services I run. The tl;dr is I don't feel I can rely on this issue being fixed yet for a large enough percentage of federation clients yet, so I can't.

I'm sharing the information I collected to decide this below in case anyone else following this issue for a similar reason finds it useful.

The hello-matrix site conveniently offers a list of servers and their version, so I went ahead and checked what that data shows:

$ curl -s "https://www.hello-matrix.net/public_servers.php?format=json" | jq '.[] | select(.last_response == 200).server_version' -r | sort | uniq -c
      6 null
      1 0.26.0
      2 0.29.0
      2 0.30.0
      1 0.31.2
      2 0.32.2
      1 0.33.0
      4 0.33.2.1
      1 0.33.3
      4 0.33.3.1
      8 0.33.4
     10 0.33.5.1
     32 0.33.6
      1 0.33.7rc2

At the time of writing, of the 75 active servers tracked by hello-matrix, 19 of them (~25%) are on versions too old to send SNI headers. The lower bound is 17% if the 'null' versioned servers all support it, but it seems more likely those versions are very old really.

Now, this isn't really representative because most servers don't add themselves to the list, and those that do are probably more closely involved in the matrix ecosystem and upgrading.
I also decided to check the list of servers I federate with to see what impact it might have on my server.

$ psql ..... 
database=# COPY (SELECT server_name FROM server_keys_json) TO '/tmp/servers-from-keys.csv' WITH CSV DELIMITER ',';

$ wc -l /tmp/servers-from-keys.csv 
3146 /tmp/servers-from-keys.csv

$ ./fetch-version-stats.sh < /tmp/servers-from-keys.csv > server_stats.csv

(script here if anyone wants it 🤷‍♂️)

Looking at the information from servers I have ever federated with, I get 1919 servers that no longer respond (typically because the operator is no longer running a synapse server for whatever reason), 446 that are < version 0.33.3, and 781 >= 0.33.3.

That means that putting synapse's federation endpoint behind a load balancer partitions me from 35% of the servers mine has interacted with (that are still active).

Of course, there's still one more statistic which is more useful: what about servers I've recently interacted with?
In reality, most of those servers in the 3k my server has seen aren't that active anymore, or at least I don't communicate with them anymore so it doesn't really matter that much if I partition myself from them, right?

Let's also look at the servers I've specifically talked to in the last 1 month period:

SELECT split_part(sender, ':', 2) as server FROM events WHERE received_ts > (extract(epoch from TIMESTAMP 'now'::timestamp - '1 month'::interval) * 1000) GROUP BY server;

Throwing the output of that query through my stats process tells me that 17% of the servers I've specifically received events from over the last 1 month period are on synapse versions too old to send SNI headers.

In summary, I don't think we can yet rely on SNI headers for our synapse setups unless we're okay with partitioning ourselves off from a subset of the fediverse.

It does seem like the majority of servers are up to date, but there's still enough lagging behind that I'm personally going to continue to dedicate a wonky special ingress setup just to matrix.

Hi @euank fwiw those stats in terms of percentages are in line with our view from matrix.org with 0.31.2 being strangely popular in 4th place