typokign/matrix-chart

Federation Port 8448 is not encrypted with TLS

Routhinator opened this issue · 19 comments

This might be a misunderstanding on my part so I'll run down my thoughts here:

  • Since this is a non-standard port, it cannot be behind Ingress-nginx (Can I tell the federation servers to use 443 somehow perhaps?)
  • If it cannot be proxied behind a TLS proxy, it needs to encrypt it's own connection with TLS certs as it needs to be connected to a loadbalancer directly.
  • Currently this cannot be federated on a standard kube deployment as servers expect to talk TLS to the 8448 port.

Good questions. Yes, you will unfortunately need to put the federation service behind a TLS proxy.

Since this is a non-standard port, it cannot be behind Ingress-nginx

Not just the NGINX Ingress Controller, but the k8s spec for Ingresses doesn't allow you to customize the port at all ☹️

Can I tell the federation servers to use 443 somehow perhaps?

AFAIK, no, you can customize the federation port but the C-S and S-S APIs must be on different ports. Since k8s only allows Ingresses on ports 80/443, changing the port doesn't really help.

If it cannot be proxied behind a TLS proxy, it needs to encrypt it's own connection with TLS certs as it needs to be connected to a loadbalancer directly.

Definitely something I would like to consider, but haven't implemented yet for the same reasons as Coturn (#5)

Currently this cannot be federated on a standard kube deployment as servers expect to talk TLS to the 8448 port.

"Standard" and "Kubernetes" are pretty antithetical 😛 If you're running in AWS or GCP, you can configure the associated load balancers to terminate TLS traffic. Home-rolled kubeadm deployment on a raspberry pi in your basement? Yeah, getting this to federate will take some more work on your part.

Again, I'm hoping to put a lot more time into this chart in the next couple of months, and in-cluster TLS for federation and Coturn are pretty high up on my priority list. PRs are always welcome too! :)

I could move TLS to the loadbalancer on DO but not sure I want to if it will break my cluster's automatic domain provisioning and I have to start adding dns records for new deployments... Other option would be to pay for another loadbalancer but that would put my bill from tolerable to uncomfortably high.

I guess the tertiary option here would be an nginx container with cert manager certs that proxies the 8448 port.

Thinking on this more overnight I'm going to work on a POC that uses a custom NGINX container and cert-manager to proxy both Coturn and the Federation port.

I really don't want to move TLS to my load balancer as that leaves traffic from the load balancer to the kube nodes in the clear.

I have an MR coming for this which uses the existing ingress TLS cert for the federation TLS. It's tested and working. I'll implement the same change for coturn once merged.

o_O You can do that automatically? I'll have to test for compatibility but that sounds like a perfect solution. Thanks a bunch!

... So was reading through this doc https://www.natrius.eu/dokuwiki/doku.php?id=digital:server:matrixsynapse

And it struck me.. The federation needs to be on a seperate port, however that can still actually be 443 - if its on a seperate domain.

So

matrix.example.com: 443 --> 8008 client/etc
synapse.example.com: 443 --> 8448 federation

with the appropriate SRV records and .well-known setup this will work

That's a great idea, thanks! Just asked around and seems like that's perfectly valid, both spec-wise and there shouldn't be any compatibility issues with Riot or other federating homeservers.

I'll get started on implementing this right away, looking forward to shutting down an unnecessary load balancer :)

Fixed in 8ff92ab

@dacruz21 Looks like there is a fatal flaw in the ingress implementation - federation connections resolve the IP of the load balancer and connect to https://x.x.x.x/_matrix

This doesn't work because ingress-nginx 404s on those requests.. whereas it was working for my implementaion on the seperate port.

I think we'll need one more ingress that has a wildcard host, and specifies the /_matrix/ path to catch these.

It should also listen on /.well-known/matrix/ to support the well-known stanzas as these come into routh.io/.well-known/matrix/** instead of matrix.routh.io or synapse.routh.io

https://matrix.org/docs/spec/server_server/r0.1.4#resolving-server-names

If <delegated_hostname> is not an IP literal and no <delegated_port> is present, an SRV record is looked up for _matrix._tcp.<delegated_hostname>. This may result in another hostname (to be resolved using AAAA or A records) and port. Requests should be made to the resolved IP address and port with a Host header containing the <delegated_hostname>. The target server must present a valid certificate for <delegated_hostname>.

I have that DNS record present. Federation still operates on the IP.

https://federationtester.matrix.org/#routh.io

You can see it gets the proper domain, but it gets a 404 because it actually makes the request to the https://104.248.105.145/_matrix/key/v2/server. And ingress-nginx cannot answer that without an ingress listing for that IP as a server directive.

When I go to the url https://synapse.routh.io/_matrix/key/v2/server which is what it should go to it works.

My federation is also broken after applying this change.

The federation tester frontend kinda sucks, I usually look at https://federationtester.matrix.org/api/report?server_name=routh.io

From there you can see the real issue, "Get https://routh.io/.well-known/matrix/server: x509: certificate is valid for ingress.local, not routh.io"

I would try to fix that issue rather than use SRV records. It's been a while since I've asked but last I heard SRV records had a number of issues and it was highly recommended to just use .well-known for federation resolution.

So that still leaves me with needing an ingress that listens on routh.io/.well-known/matrix/server while my primary ingress lives on matrix.routh.io or synapse.routh.io

Theres a way to listing on all hosts with a snippet which may be desirable for the .well-known stuff..

nginxinc/kubernetes-ingress#209 (comment) - however it would need to live on a seperate ingress definition as you would not want to bind the / path with that.

Alternatively there could be an ingress that listens on very specific paths on the root domain.

Yeah, it was a conscious decision to not include a .well-known/matrix/server handler in the chart, mainly because I assume most people already have a static site at the root of their domain which may or may not be hosted from the same cluster. Seems totally overkill to me to host an entire nginx pod just to serve a little JSON object at one URL.

I agree there. But don't need a pod.. just a nginx.ingress.kubernetes.io/configuration-snippet

Something along this line

annotations:
  nginx.ingress.kubernetes.io/configuration-snippet: |
    location /.well-known/matrix/server {
      return 200 '{"m.server": "example.com:443"}';
      add_header Content-Type application/json;
    }
    location /.well-known/matrix/client {
      return 200 '{"m.homeserver": {"base_url": "https://example.com"},"m.identity_server": {"base_url": "https://vector.im"}}';
      add_header Content-Type application/json;
      add_header "Access-Control-Allow-Origin" *;
    }

Cool, didn't know the nginx ingress provider actually supported that. But still, that's not a standard k8s API, so I'm super hesitant to add anything that makes this chart less portable, and needs to be disabled if you're using a different ingress provider, and requires you to update your DNS if you were hosting your root domain outside of the cluster.

Looks like you already have a static site at your root domain, why not just add the .well-known to your files there? That's what I do: https://github.com/dacruz21/website/blob/master/.well-known/matrix/server

Or if you want to keep this in an Ingress, and you're using an umbrella chart (which I recommend over cloning this repo), you can add a barebones Ingress to your templates/ dir and just put the snippet there. It won't deploy any extra proxies or cost you anything.

I get where you're coming from. I don't actually have a static site, just a default-backend container with 404 docs.. I could shove this in there however I would prefer to have something like this (which is very app specific) configured with the app rather than in a main root site statically.

So I took the template suggestion. Maybe you might consider option features that do not lock a user into ingress-nginx but simply support it if its being used since it's the most widely used ingress controller.

This was my solution.. (not sure what to bind the indentity server to yet though)

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: {{ include "matrix.fullname" . }}-wellknown
  {{- with .Values.matrix.ingress.annotations }}
  annotations:
    {{- toYaml . | nindent 4 }}
  {{- end }}
    nginx.ingress.kubernetes.io/configuration-snippet: |
      location /.well-known/matrix/server {
        return 200 '{"m.server": "{{ .Values.matrix.ingress.hosts.federation }}:443"}';
        add_header Content-Type application/json;
      }
      location /.well-known/matrix/client {
        return 200 '{"m.homeserver": {"base_url": "https://{{ .Values.matrix.ingress.hosts.synapse }}"},"m.identity_server": {"base_url": "https://vector.im"}}';
        add_header Content-Type application/json;
        add_header "Access-Control-Allow-Origin" *;
      }
spec:
{{- if .Values.matrix.ingress.tls }}
  tls:
    {{- range .Values.matrix.ingress.tls }}
    - hosts:
      {{- range .hosts }}
      - {{ . | quote }}
      {{- end }}
      secretName: {{ .secretName }}
    {{- end }}
{{- end }}
  rules:
    - host: {{ .Values.matrix.ingress.hosts.root }}
      http:
        paths:
          - path: "/.well-known/matrix/"
            backend:
              serviceName: "{{ include "matrix.fullname" . }}-synapse"
              servicePort: {{ .Values.matrix.synapse.service.port }}

Your ingress definition looks good. Yeah, I'll definitely consider it, but it's low on my priority list and I would want to leave it disabled by default (so that it's not a breaking change for anyone running this chart without the nginx provider). I just fear it will get lost in the sea of values, so I'm going to wait until I get around to writing some docs for this thing so it doesn't get forgotten.

Identity server doesn't really matter all that much. It just allows you to attach an email address to your matrix username so that others can look you up by email. You could remove it entirely or leave it as vector.im (vector.im's identity server is public).

I'll be adding the ma1sd identity server to this chart eventually, but it will probably be disabled by default. (#18)

Looks like you're passing the federation tester 🎉 so I'll close this issue now. Let me know if you have any more questions.