Investigate ACME integration for SF provider
Opened this issue · 11 comments
Investigate the Traefik ACME integration models and which works best for our provider.
Steps to use ACME with Traefik on SF:
-
Create an Azure DNS Zone
-
Point domain registrar's Nameservers to Azure DNS Zones Nameservers
-
Create wildcard A/AAAAA record pointing to ALBs PIP
-
Create Service Principal for RBAC
az ad sp create-for-rbac -n "traefik" --scopes /subscriptions/{SUB_ID}/resourceGroups/{RES_GRP}/providers/Microsoft.Network/dnszones/{DNS_ZONE}
-
Add
AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID, AZURE_TENANT_ID, AZURE_RESOURCE_GROUP
as environment variables in TraefikServiceManifest.xml
-
Add TLS entrypoint and any optionally a redirect rule (80 -> 443)
[entryPoints.http] address = ":80" [entryPoints.http.redirect] entryPoint = "https"
-
Enable and populate [acme] configuration
-
Add labels to a web service's
ServiceManifest.xml
file:<Label Key="traefik.frontend.rule">Host:test.yourdomain.com</Label> <Label Key="traefik.passHostHeader">true</Label> <Label Key="traefik.expose">true</Label>
-
Hit http[s]://test.yourdomain.com
Work to do:
- Store the certificates "acme.json" in a replicated fashion to avoid having to refresh tokens after node failure and to help reduce requests to Let'sEncrypt as it has rate limiting. This requires creating a key-value store provider http://v1-5.archive.docs.traefik.io/user-guide/kv-config/
- Could mount a volume that Traefik writes the config/acme.json files too which is shared amongst each instance? We'd need a way of electing a master with write status.
- Automate DNS configuration
@jjcollinge Is it correctly understood that this will work in the current release assuming that the DNS is setup to point to the load balancer "manually" and the number of nodes is small enough that the lets encrypt rate limit won't be hit?
Btw, the doc link ( http://v1-5.archive.docs.traefik.io/user-guide/kv-config/ ) is dead.
Wrt to a key-value store, then this might be a stupid question, can't you use a Stateful SF service?
Hi @petertiedemann - yes you are correct - LE ratelimiting is quite aggressive for production services so it's hard not to hit the limit: https://letsencrypt.org/docs/rate-limits/ - A workaround is to pre-provision the certificate per cluster deployment, add the cert to the code package and update the Traefik.toml to point use it.
Yes, we could use a SF stateful service but we don't really wan't to maintain an additional C# service. We're going to try and fix this using the native SF replicated volume driver once that becomes available.
If it can be done with no additional services that is of course to be preferred, but if one is required, then what is wrong with one in C#? :)
We having been holding out for the SF volume driver and we are closing in on a release so it felt worthwhile to hold off. If this is a blocker for you, let us know and we can re-evaluate or feel free to create a PR.
If there is a nice solution with the volume driver then I don't see any reason to try a more complex solution. Unless of course the complexity then becomes related to electing a master etc.
@jjcollinge After messing around a bit with ACME, we found that in fact there are more critical problems than rate limiting. We cannot use the HTTP challenge as the challenge request is likely to be routed to another node than the one sending the request due to the load balancer. We can also not use the DNS challenge when multiple nodes try to get a certificate for the same domain at the same time (because they will all try to write to the same TXT record with different payloads).
For now we have decided to have an external script obtain a new certificate from time to time, and then simply update the traefik application package with this new certificate and push that update to the SF cluster. However, i am very interested in hearing if there has been any progress on getting a "proper" solution for this?
Is there any change to the progress for this? I have a situation that requires this to be able to Traefik as our rev proxy. We are keen to be able to use Lets Encrypt which we already do use in a complex way outside of service fabric using ARR and some automations.
Our solution is multitenanted where the tenant can use their own domain, so need to be able have certificates generate on the fly but with many instances of traefik we'd likely hit the rate limits.
Alternatlively, Is there an easy way to change the TOML file without a redeploy of traefik?
Hi, Unfortunately there isn't any ongoing work on this one but we'd love a contribution if it was something you wanted to pick up and look at?
For updating the TOML while traefik is running there is a --file.watch
field which looks for config provide by TOML files - I don't know if this can configure TLS settings.
--file Enable File backend with default settings (default "false")
--file.constraints Filter services by constraint, matching with Traefik tags. (default "[]")
--file.debugloggeneratedtemplate Enable debug logging of generated configuration template. (default "false")
--file.directory Load configuration from one or more .toml files in a directory
--file.filename Override default configuration template. For advanced users :)
--file.templateversion Template version. (default "0")
--file.trace Display additional provider logs (if available). (default "false")
--file.watch Watch provider (default "true")
You could investigate whether the Azure Files Volume Driver would allow you to mount an Azure Files share and update the TOML dynamically. However, it's still in preview and I think would require Traefik to be run inside an container which hasn't been tested much.
Hi, Unfortunately there isn't any ongoing work on this one but we'd love a contribution if it was something you wanted to pick up and look at?
For updating the TOML while traefik is running there is a
--file.watch
field which looks for config provide by TOML files - I don't know if this can configure TLS settings.--file Enable File backend with default settings (default "false") --file.constraints Filter services by constraint, matching with Traefik tags. (default "[]") --file.debugloggeneratedtemplate Enable debug logging of generated configuration template. (default "false") --file.directory Load configuration from one or more .toml files in a directory --file.filename Override default configuration template. For advanced users :) --file.templateversion Template version. (default "0") --file.trace Display additional provider logs (if available). (default "false") --file.watch Watch provider (default "true")
You could investigate whether the Azure Files Volume Driver would allow you to mount an Azure Files share and update the TOML dynamically. However, it's still in preview and I think would require Traefik to be run inside an container which hasn't been tested much.
With using the Service Fabric Provider, would adding the file backend work with TLS certificates being added dynamically? I can see that adding that to the TOML shows a new provider but I am unsure if that provider would enable TLS certificate updates?
Afraid I can't give any guidance on that as I'm not sure - best way would be to run some tests.
I ended up doing a parent process that runs traefik as a sub process. It does the certificates and when there is a change it updates the TOML file and restarts traefik, not ideal but gets the job done.
Was quite a pain to make it support Linux though as the binary file of the traefik binary needs to have the execute permission added since its not the main binary of the service anymore.