Switch to acme.sh

Question

Switch to acme.sh

buchdag opened this issue 6 years ago · 46 comments

As indicated there, a v2.0 version of letsencrypt-nginx-proxy-companion using acme.sh instead of simp_le is being worked on. I'm opening this issue so we can discuss the potential non backward compatible changes introduced by this ACME client swap and how we should handle them.

For now the branch I'm working on is only available on my fork of the project : https://github.com/buchdag/docker-letsencrypt-nginx-proxy-companion/tree/acme.sh

First, here are the issues I encountered.

General use

The first big difference between the two ACME clients is that while simp_le is stateless, acme.sh is not. It works with a configuration directory and will generate configuration files no matter what. This mean that either a new docker volume will be needed to store those configurations, or that we'll have to put them somewhere into the already existing /etc/nginx/certs.

Certificates

acme.sh generate the certificates in its own certificate directory (defaulting to the config directory) with different, non configurable naming conventions than simp_le, then optionally copy them to the directory and file name of your choice. It uses its own certificate directory to keep track of what certificate have been generated and if they are still valid or not. If we want to migrate existing certs to acme.sh, we'll have to put them in its cert folder with the same naming convention and re-create the "status file" it uses to keep track of each cert status. But the bigger issue here is that we didn't even have a way to say for sure if a cert inside /etc/nginx/certs was handled by the companion until very recently, so making such a migration automatic might not even be possible. And we don't even have a way for the container to know its own version yet.
acme.sh makes no distinction between certificates issued from different ACME endpoints. If you ask for a cert from the production endpoint, then for a cert covering the same base domain from the staging endpoint, the first cert will be overwritten. The same goes with V1 <> V2 endpoints.

ACME accounts

acme.sh and simp_le store the ACME account keys in a different way : simp_le encode the public and the private key in JWK format on a single file and store just those two infos while acme.sh encode the private key in PEM format, the public key in JWK format inside a JSON file and additionally store the account id, contact address, creation IP, creation date and account status inside the JSON.
Incidentally, they don't handle contact email the same way. acme.sh is using a per-config directory email address which has to be configured prior to certificate issuance while simp_le can take a different address at each invocation. BTW @cpu if you read this, I'm unsure about the way this is handled on the ACME side, are the contact address(es) only stored at the account level and written over if modified, or do you store a potentially different set of contact address(es) for each generated certificate ?

Minor issues

the DEBUG environment variable is used by acme.sh too but it expect a numeric value, so we can't use true anymore.
when issuing a certificate, acme.sh prints it in PEM format on std output. It might be a matter of personal taste but I find it to be rather unclean and not fit for a non interactive process.
by default acme.sh keep the same private key when renewing a certificate while simp_le has the inverse behaviour.

I'll detail the way I'm currently handling each of these issues later today or in the few next days.

Answer 1 · 2019-02-13T16:03:35.000Z

BTW @cpu if you read this, I'm unsure about the way this is handled on the ACME side, are the contact address(es) only stored at the account level and written over if modified, or do you store a potentially different set of contact address(es) for each generated certificate ?

It's the former: contact address(es) are properties of the ACME account resource. There's no way specified in the protocol draft to set a contact address for a specific order, authorization, or issuance. Any updates to the contact information would be handled as an account update.

Hope that helps!

Answer 2 · 2019-02-13T16:40:50.000Z

In the current acme.sh based version I've got (which pass all tests and is currently used on one of my servers), I did the following to address each issue:

the image comes preconfigured to use a default configuration directory at /etc/acme.sh/default, with /etc/acme.sh being defined as a volume in the Dockerfile. I personally don't think ACME accounts and acme.sh config should be stored in /etc/nginx/certs or be shared with the internet facing proxy container. My thought is we'll probably be forced to break backward compatibility for some stuff anyway, so we should as well correct design issues like this one along the way.
acme.sh puts it certificate in /etc/acme.sh/default then copy them with the same naming convention as simp_le in the /etc/nginx/certs folder. Proceeding like this require the least amount of modification on the existing service loop and test suite code. I haven't yet worked on migrating existing cert and I highly doubt that the process could be made automatic because of what I said previously.
I haven't covered that yet but I think it might be easily addressed by using multiple cert subdirectories in /etc/acme.sh/default based on the ACME endpoint used.
for now my solution is not caring about migrating ACME account keys and just let acme.sh create its own account keys. Extracting private key in PEM format from a JWK is doable (did a proof of concept a few days ago) but I really don't think it's worth the effort plus not trying to import account keys from another client is kind of the recommended solution when using certbot anyway, so I don't think this will change.
the container use as a DEFAULT_EMAIL environment variable that set the contact email address for the default ACME account. @cpu answer implies that the way the containers have been doing it for a while does not actually work (= using a single account and being able to specify a per container/certificate email address). I don't remember exactly how simp_le handle this internally, but I suspect the email address used is the one supplied when the account creation happens for the first time.
this is going to be adresses by an upcoming separate PR that rework the logging system, and DEBUG=true will automatically be converted to the relevant environment variable and value.
we could pipe acme.sh output to grep or sed to try to strip out the certificate from the std out but that's not a super clean solution. Or we could try to contribute a 'non interactive' output mode to acme.sh.
the environment variable REUSE_PRIVATE_KEYS defaulting to false has been replaced by the RENEW_PRIVATE_KEYS defaulting to true but it's probably better to keep the old environment variable even is the acme.sh command line flag is actually the inverse.

Answer 3 · 2019-02-13T16:42:45.000Z

@cpu yes that was helpful, thanks ! 😃

Answer 4 · 2019-02-13T18:35:26.000Z

@cpu if you have 5~10 minutes to spare, would you happen to have an advice on the 4th point above ?

Answer 5 · 2019-02-14T13:49:42.000Z

@buchdag I agree its probably not worth the trouble of migrating existing account keys. It might be more important one day when acme-caa is enabled in production and users could have accounturi restrictions but today that isn't the case (except in staging). The only other case I see where folks have a hard time transitioning to a new account key is if they have coordinated with Let's Encrypt for a rate limit adjustment. Those are often done by ACME account ID as well. Do you have a sense of whether your users might be in this position?

Answer 6 · 2019-02-14T20:40:41.000Z

I don't remember exactly how simp_le handle this internally, but I suspect the email address used is the one supplied when the account creation happens for the first time.

That is what I wanted to talk about previously.
I closely followed the debug log of simp_le along with the official ACME protocol and noticed that the request that simp_le sends for the email is invalid, and the ACME response is silently ignored.
So i.e. the LETSENCRYPT_EMAIL never actually worked!
(As you said, the very first LETSENCRYPT_EMAIL mail is used for all certificates on the same endpoint)
The global DEFAULT_EMAIL would be the only working way.

Generally I agree with all design decisions, but I will investigate closer at a latter time.

The only other case I see where folks have a hard time transitioning to a new account key is if they have coordinated with Let's Encrypt for a rate limit adjustment. Those are often done by ACME account ID as well. Do you have a sense of whether your users might be in this position?

If I remember my digging correctly, simp_le doesn't store the account ID in any way. So the only way for the user to get the ID with letsencrypt-companion would be to dig through the debug logs and find the ACME response.
So the average user would be excluded from that.

Answer 7 · 2020-02-04T00:14:39.000Z

I'm brand new to this project, but I'd be interested in helping out, if you're still looking for help.
Where did you end up with the migration?

Answer 8 · 2020-05-29T20:46:27.000Z

Hi @buchdag,

I've tried the 'acme.sh' branch of your fork with some succes after rebasing it on v1.12.1 plus these changes :

use acme.sh v2.8.6 (v2.8.0 doesn't work anymore)
install acme.sh twice:
- once with config-home=/etc/acme.sh/staging
- once with config-home=/etc/acme.sh/default
update letsencrypt_service script to use the 'staging' config when LETSCENCRYPT_TEST=true

Any chance to have your branch rebased as well so anyone working on it could submit pull requests?

Thanks,

_g.

Answer 9 · 2020-05-30T12:43:25.000Z

@pini-gh
you can try my fork here:
https://github.com/Neilpang/nginx-proxy

Also it has a publish docker image: neilpang/nginx-proxy

Answer 10 · 2020-05-30T14:55:59.000Z

@Neilpang
It's neat but it doesn't suit my needs. I want to be able to issue certificates for wildcard domains. I'm not there yet, but working on it.

Answer 11 · 2020-06-01T16:40:44.000Z

Here is my fork:
https://github.com/pini-gh/docker-letsencrypt-nginx-proxy-companion/tree/acme.sh

Answer 12 · 2020-06-03T11:07:26.000Z

I now have a working configuration with DNS mode enabled. It needs polishing and corner cases testing, but it does work for basic usage.

Answer 13 · 2020-07-27T09:03:26.000Z

@pini-gh I have testing your fork in dns_ali mode. My nginx-proxy compainon's configuration is:

OPT=/opt/nginx-proxy

docker run -d --name nginx-proxy \
    -p 80:80 \
    -p 443:443 \
    -v /var/run/docker.sock:/tmp/docker.sock \
    -v $OPT/log:/var/log/nginx \
    -v $OPT/dhparam:/etc/nginx/dhparam \
    -v $OPT/certs:/etc/nginx/certs \
    -v $OPT/conf.d:/etc/nginx/conf.d \
    -v $OPT/vhost.d:/etc/nginx/vhost.d \
    -v $OPT/html:/usr/share/nginx/html \
    -e "TZ=Asia/Shanghai" \
    -e "VIRTUAL_HOST=foo.example.com" \
    -e "VIRTUAL_UPSTREAM=192.168.8.10:8080,192.168.8.10:80 down" \
    -e "HTTPS_METHOD=noredirect" \
    -e "LETSENCRYPT_HOST=fo.example.com" \
    -e "LETSENCRYPT_DNS_MODE=dns_ali" \
    -e "LETSENCRYPT_DNS_MODE_SETTINGS=export Ali_Key=xxxxxx; export Ali_Secret=xxxxxxx;" \
    --label com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy \
    --restart always \
    ccpaging/nginx-proxy:alpine

It is working with my nginx-proxy with VIRTUAL_UPSTREAM support.

And I add Dockerfile.alpine and install_acme_master.sh in my fork:

https://github.com/ccpaging/docker-letsencrypt-nginx-proxy-companion

Can I make a pull request to your fork?

Answer 14 · 2020-07-27T19:49:46.000Z

Can I make a pull request to your fork?

I've had a look at your commit. As I understand it it brings two changes:

A one-stage only Dockerfile installing docker-gen from its binary release instead of compiling it from source
Install acme.sh from the head of its master branch

I'm not fond of the former as it brings nothing new to the end user.

For the latter no need to add yet another script. I would consider a patch using a Docker build arg to override the default tree-ish to checkout.

Answer 15 · 2020-07-29T06:18:31.000Z

I've had a look at your commit. As I understand it it brings two changes:

A one-stage only Dockerfile installing docker-gen from its binary release instead of compiling it from source

Install acme.sh from the head of its master branch

Yes. Accessing github.com may be slower and unstable sometime or somewhere. This companion should have Dockfile.alpine just like nginx-proxy. It is not urgent.

Answer 16 · 2020-10-08T22:23:30.000Z

@pini-gh sorry for the lack of answer, been busy on other stuff for a while.

A rebased acmesh branch will be added to this repo (not my fork) asap.

@ccpaging going back to fetching docker-gen binaries from GitHub isn't planned as we want to retain multi arch capabilities through muti stage builds.

Installing acme.sh from the head of the master branch instead of a release tag isn't planned either, but @pini-gh idea of a Docker build arg is conceivable.

This companion should have Dockfile.alpine just like nginx-proxy. It is not urgent.

The companion is already based on Alpine.

Answer 17 · 2020-10-09T22:24:48.000Z

The rebased acme.sh branch has been pushed to this repo.
The Docker image is available for test : jrcs/letsencrypt-nginx-proxy-companion:acmesh

@pini-gh your design decisions seems pretty sensible to me and I'd like to merge them to the branch but the DNS stuff will have to wait just a little. Would it be ok to you if I committed those changes to the branch with you as the author ?

Answer 18 · 2020-10-10T08:07:55.000Z

@buchdag please do, and let me know when it's done so I could rebase my work on your branch.

Answer 19 · 2020-10-10T13:02:15.000Z

@buchdag I've just setup branch acme.sh-pini on top of your current acme.sh branch. This should ease cherry-picking from there.

Answer 20 · 2020-10-10T14:48:23.000Z

Many thanks !

If possible, could you rework fca180c a bit so that I could directly cherry pick it ?

on your commit README.MD got rolled back to an older, outdated version.
this block is not required on entrypoint.sh and letsencrypt_service as the first action of both is sourcing functions.sh, which starts with the lc() definition then the same block :

if [[ ${DEBUG:-} == true ]]; then
  DEBUG=1 && export DEBUG
fi

accountemail="${!accountemail_varname}" should be accountemail="${!accountemail_varname:-"<no value>"}" to correctly handle environment variable set to an empty value.
rather than ${accountemail:+--accountemail ${accountemail}} i'd prefer adding the argument to the params_d_arr array to stay in line with the existing code.
and last could you use the staging config without email wether staging is going to be used because the LETSENCRYPT_TEST variable or the two (global and per proxied container) ACME_CA_URI variables ? Thats means that $ACME_CA_URI and LETSENCRYPT_${cid}_ACME_CA_URI both have to be checked against a regexp in addition to checking LETSENCRYPT_${cid}_TEST for true.

Let me know if this is possible for you, and if not: do you mind if I co-author the commit to change this ?

Answer 21 · 2020-10-10T16:05:11.000Z

Done. Please tell me in case I missed something.

Answer 22 · 2020-10-10T18:02:40.000Z

Hmmm the certs_standalone test with cherry picked 8b2c6ee is failing : https://travis-ci.com/github/buchdag/docker-letsencrypt-nginx-proxy-companion/builds/189210442

I don't really get why this test specifically and the function that fetch the relevant companion logs in case of CI failures fails for this test too, yay.

Answer 23 · 2020-10-10T20:38:46.000Z

There is a typo in line 175 of app/letsencrypt_service: <no value> should read <no-value>. Sorry about that.

Update: this is the other way around: actually the typo is line 174 where no-value should read no value.

Answer 24 · 2020-10-10T20:54:31.000Z

And another typo line 182: acme_ca_uri="ACME_CA_TEST_URI"should read acme_ca_uri="$ACME_CA_TEST_URI". Sorry again.

Answer 25 · 2020-10-13T11:02:35.000Z

@pini-gh I'm re-committing your changes in smaller increments and adding tests (+ doc), I'll force push the branch to this repo when things will be clean enough.

6673436 : I don't think --auto-upgrade, --nocron and --noprofile are even evaluated when acme.sh is called without the --install flag.

4c9664d : if I understood correctly the http01 challenge was entirely removed by this commit ?

Could you tell me more about eedabe1 ? The .companion file was precisely added to avoid overwriting certificates that aren't handled by the companion container.

Answer 26 · 2020-10-13T18:01:02.000Z

6673436 : I don't think --auto-upgrade, --nocron and --noprofile are even evaluated when acme.sh is called without the --install flag.

Indeed, you're right.

4c9664d : if I understood correctly the http01 challenge was entirely removed by this commit ?

Not sure what you mean. If it is about removing the location_configuration bits, then yes. My opinion is that the LE companion shouldn't need write access to /etc/nginx/vhost.d/. Instead nginx-proxy can easily set up the challenge blocks where HTTPS is wanted. BTW it does handle a couple of cases already. I just had to fix another couple in my nginx-proxy fork.

Could you tell me more about eedabe1 ? The .companion file was precisely added to avoid overwriting certificates that aren't handled by the companion container.

I may have misunderstood the .companion file's role. The correct fix for the problem I wanted to fix then might be a6c8f2d.

Answer 27 · 2020-10-14T09:49:25.000Z

I've pushed and updated acme.sh branch to this repo, the jrcs/letsencrypt-nginx-proxy-companion:acmesh image is available on DockerHub too.

Not sure what you mean. If it is about removing the location_configuration bits, then yes. My opinion is that the LE companion shouldn't need write access to /etc/nginx/vhost.d/. Instead nginx-proxy can easily set up the challenge blocks where HTTPS is wanted. BTW it does handle a couple of cases already. I just had to fix another couple in my nginx-proxy fork.

Ok, I mistakenly thought that due to this commit the container wasn't supporting http01 ACME challenge anymore but only dns01.

I tend to agree with you on the location_configuration stuff, it is messy and prone to unidentified race conditions, but that was made before any ability to handle ACME challenges was built into upstream nginx.tmpl (and is still required if we want to support people that use an nginx-proxy image or nginx.tmpl predating this change.

Could you tell more about what cases are now correctly handled by the upstream nginx-proxy and what cases you fixed on your fork (and how) ?

Answer 28 · 2020-10-14T10:56:13.000Z

Could you tell more about what cases are now correctly handled by the upstream nginx-proxy and what cases you fixed on your fork (and how) ?

Please see https://github.com/pini-gh/nginx-proxy/tree/acme-challenge.

You may be interested in pini-gh/nginx-proxy@8e173b7 as well.

Answer 29 · 2020-10-14T12:35:18.000Z

I've pushed and updated acme.sh branch to this repo, the jrcs/letsencrypt-nginx-proxy-companion:acmesh image is available on DockerHub too.

I've now rebased my acme.sh-pini branch on your acme.sh.

Answer 30 · 2020-10-19T18:48:45.000Z

Could you tell more about what cases are now correctly handled by the upstream nginx-proxy and what cases you fixed on your fork (and how) ?

Please see https://github.com/pini-gh/nginx-proxy/tree/acme-challenge.

You may be interested in pini-gh/nginx-proxy@8e173b7 as well.

To be honest the docker-gen templating language in general is a bit difficult to digest, and generating nginx configuration with it even less so, could you walk me through the logic ? What was causing failed authorisations and how did you solve it ?

Answer 31 · 2020-10-19T19:33:15.000Z

Well, the main point is this change in pini-gh/nginx-proxy@b73ffd8:

-{{ $is_https := (and (ne $https_method "nohttps") (ne $cert "") (exists (printf "/etc/nginx/certs/%s.crt" $cert)) (exists (printf "/etc/nginx/certs/%s.key" $cert))) }}                                           
+{{ $is_https := (ne $https_method "nohttps") }}

The template should use HTTPS mode wherever the user explicitly stated he wants HTTPS.

Then the acme-challenge snippet is generated for the cases where the certificate doesn't exist yet as well.

This is what can be induced from the code and comments that follow:

 {{ if $is_https }}
+{{/* No cert? But we do want HTTPS! Let's fallback on the nginx default one.                                                                                                                                      
+     And set up acme-challenge so that Let's Encrpyt can do its job */}}                                                                                                                                          
+{{ $cert := coalesce $cert "default" }}                                                                                                                                                                           
+{{/* Very same reasoning as above */}}                                                                                                                                                                            
+{{ $cert := when (and (exists (printf "/etc/nginx/certs/%s.crt" $cert)) (exists (printf "/etc/nginx/certs/%s.key" $cert))) $cert "default" }}

This part makes also the server use the default certificate when the dedicated one doesn't exist yet.

The next commit, pini-gh/nginx-proxy@f1baeda handles the case where the user set HTTPS_METHOD=noredirect. This case involves another branch of the template, guarded with this test:

{{ if or (not $is_https) (eq $https_method "noredirect") }}

The other branch which has the acme-challenge block already is guarded with:

{{ if $is_https }}
...
{{ if eq $https_method "redirect" }}

hence doesn't cover the HTTPS_METHOD=noredirect case.

Finally, commit pini-gh/nginx-proxy@537334f just adds a few comments into the generated nginx config file.

Answer 32 · 2020-10-20T07:10:23.000Z

I've now rebased my acme.sh-pini branch on your acme.sh.

Thanks !

3aee0a1 will be worked on a bit later as I intend v2.0.0 to target http-01 challenge only. The removal of the location_configuration stuff won't be merged as-is, as the objective is still compatibility with upstream nginx-proxy.

3183301 and c5b5d62 won't be merged until those features make it to upstream nginx-proxy for the same reason.

51609ac will probably be merged after v2.0.0 is released.

Instead of 1758a52 I was thinking of invoking acme.sh --register-account inside update_cert(), any feedback on that ?

Answer 33 · 2020-10-20T17:37:44.000Z

Instead of 1758a52 I was thinking of invoking acme.sh --register-account inside update_cert(), any feedback on that ?

No strong opinion about that, provided that it does the trick: preventing an override of the configuration with previous settings.

Answer 34 · 2020-10-22T20:52:36.000Z

Okay I think I got the acme.sh branch pretty much where I wanted it for v2.0.0, bar a few updates to the docs and maybe a few more tests and/or updates to existing tests.

Answer 35 · 2020-10-23T13:37:16.000Z

Great! Looking forward to rebase on v2.0.0 :)

Answer 36 · 2020-11-30T14:12:36.000Z

#719 has been merged, simp_le has now been replaced by acme.sh

Answer 37 · 2020-12-17T16:33:51.000Z

I tried digging into code and the related issues, but it was not clear to me:

Is this implementation (v2.0.1) capable of dns01 challenges? Or is it still http01 challenges only? I'd be an awesome feature to have dns challenges implemented, specially for users selfhosting at home, on ISPs that block ports 80 and 443. Using alternative ports such as 8080 or 8443 kills the http01 challenges.

Answer 38 · 2020-12-17T18:12:31.000Z

Is this implementation (v2.0.1) capable of dns01 challenges?

No, it isn't yet. But this is planned, as I undertand it. Meanwhile you may want to try my fork which has dns01 challenge implemented.

Answer 39 · 2020-12-18T13:01:45.000Z

Meanwhile you may want to try my fork

It does work indeed! Minor intervention on my ansible roles. Thank you for the pointer, and for the great work!

Answer 40 · 2020-12-18T16:49:07.000Z

Thank you for the feedback!

Answer 41 · 2021-03-25T14:40:09.000Z

I still get too many certificates error even after I added --volume /etc/acme.sh string to the jrcs/letsencrypt-nginx-proxy-companion container.
How can I fix this?

I use exactly the same command as in the README:

docker run --detach \
    --name nginx-proxy-letsencrypt \
    --volumes-from nginx-proxy \
    --volume /var/run/docker.sock:/var/run/docker.sock:ro \
    --volume /etc/acme.sh \
    --env "DEFAULT_EMAIL=mail@yourdomain.tld" \
    jrcs/letsencrypt-nginx-proxy-companion

I also tried mount /etc/acme.sh to the host, it has no effect.

I also tried to use simp_le version v1.13.1 - this version produces the same errors too.

Answer 42 · 2021-03-25T17:05:43.000Z

Once the limit is reached you have to wait a few days. In the meantime use LETSENCRYPT_TEST=true and check that the resulting certificates are persistent.

Answer 43 · 2021-03-25T20:17:29.000Z

Once the limit is reached you have to wait a few days. In the meantime use LETSENCRYPT_TEST=true and check that the resulting certificates are persistent.

Yes, I already waited a week and this problem does not disappear. I see in the logs that all certificates reissue each time when I reload the docker project. I also see different certificates in the browser. Finally, I just see those too many certificates errors.

Answer 44 · 2021-03-25T21:04:34.000Z

As I said, set LETSENCRYPT_TEST=true until you fix the problem.

Answer 45 · 2021-03-25T21:21:00.000Z

@CaliforniaMountainSnake --volume /etc/acme.sh create an anonymous volume that will persist data over containers stop/start but which will be re-created if the container is deleted.

Use a named volume instead:

docker volume create acme
docker run --detach \
    --name nginx-proxy-letsencrypt \
    --volumes-from nginx-proxy \
    --volume /var/run/docker.sock:/var/run/docker.sock:ro \
    --volume acme:/etc/acme.sh \
    --env "DEFAULT_EMAIL=mail@yourdomain.tld" \
    jrcs/letsencrypt-nginx-proxy-companion

Answer 46 · 2021-03-27T15:17:34.000Z

@buchdag, thank you, with named volumes the certificates are persisted!