certbot/certbot

live/example.com is not updated atomically

RalphCorderoy opened this issue · 1 comments

The updating of the files available under /live/example.com is not atomic. Programs can read a mix of incompatible old and new files via the symlinks or find a symlink isn't present. The programs using the files shouldn't need to workaround the lack of atomicity.

One way to remove the burden from them is to leave /live as an area only used by Certbot and have a deploy hook on renewal which atomically updates copies outside /etc/letsencrypt which the programs use. For example, have a symlink to a directory which the program open(2)s and then uses openat(2) or similar to read the files in that directory rather than have the path repeatedly followed.

On a heavily loaded system the window of inconsistency during a renew can be quite long. It is made worse by the Python code not being structured to minimise its length. If programs are started by a socket connection to handle one transaction, with many domains and many sockets, then the files are being read frequently which raises the chance of hitting the race-condition window.

Ideally, Certbot should make an atomic update available but assuming that won't happen then the issue should be documented so users investigating odd errors may save time.

I see Postfix is aware of the general problem.

You can also store the keys separately from their certificates, again provided each is listed before the corresponding certificate chain. Storing a key and its associated certificate chain in separate files is not recommended, because this is prone to race conditions during key rollover, as there is no way to update multiple files atomically.