SSLMate/certspotter

429 Too Many Requests from TrustAsia and DigiCert logs

imrejonk opened this issue · 24 comments

Since Monday around 16:00 CET, my Cert Spotter installation has been outputting this on every run:

2021/11/08 16:00:04 https://ct.trustasia.com/log2022/: Error retrieving STH from log: GET https://ct.trustasia.com/log2022/ct/v1/get-sth: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>openresty</center>
</body>
</html>
)
2021/11/08 16:00:04 https://ct.trustasia.com/log2023/: Error retrieving STH from log: GET https://ct.trustasia.com/log2023/ct/v1/get-sth: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>openresty</center>
</body>
</html>
)

I know that Cert Spotter downloads a list of CT logs to check from certspotter.org before every run, so I guess that something in that list was recently changed. Is this expected behaviour and is TrustAsia's WAF a bit trigger happy or is Cert Spotter incorrectly sending lots of meaningless requests for the empty 2022/2023 TrustAsia logs?

AGWA commented

There haven't been any changes to loglist.certspotter.org since September 14. certspotter sends one get-sth request to each log per invocation. I haven't observed any problems with the TrustAsia logs, so it seems they are blocking you for some reason. I recommend getting in touch with the TrustAsia log operators (trustasia-ct-logs@trustasia.com). If you don't get a satisfactory response from them, report back and we can raise this to browsers as a potential policy violation.

Thanks for the swift reply, I've contacted the TrustAsia log operators and will update this issue once I get a response.

Hello @imrejonk I'm the TrustAsia log operators, we were got you email.

And our team will discuss about you issue.

We announce the Rate Limited : 8QPS , please see the issues https://bugs.chromium.org/p/chromium/issues/detail?id=1162248 .

Our server monitor to raise the alarm at (2021-11-11 12:50:00 +0800) and (2021-11-12 12:18:00 +0800) . maybe is your requests.

And we will discuss with you later.

Hi @mingjunyang, thanks for the reply. I understand that the TrustAsia CT logs are rate limiting requests to eight queries per second (8QPS) per IP address. My Cert Spotter instance must have exceeded that limit.

Am I correct in assuming that the 8QPS limit applies to each one of TrustAsia's logs individually? Can I perform 8QPS to log2021, log2022 and log2023 at the same time (24QPS in total) without being rate limited?

I think that the root problem here is either:

  1. Cert Spotter does not limit itself to eight queries per second per log, or
  2. eight queries per second is not enough and TrustAsia should raise it.

I'm curious what @AGWA's opinion is on this.

I should note that I haven't seen the 429 Too Many Requests error anymore since Thu, 11 Nov 2021 05:00:03 +0100.

AGWA commented

Assuming 8QPS is per log, 8QPS is enough to keep up with the current growth rate of logs. Note that Cert Spotter doesn't explicitly limit the rate of queries sent to logs. Ideally it would, but since it doesn't send queries to the same log in parallel, queries are implicitly rate limited and log rate limits haven't been a problem in practice.

queries are implicitly rate limited

With this you mean that the performance of the machine Cert Spotter is running on as well as the network connection currently limit the amount of requests?

log rate limits haven't been a problem in practice

Well they were for me, at least for a short while. I haven't encountered TrustAsia's rate limit since last Thursday so I guess the practical implications are currently low if not zero. Guess I'll close this issue for now. I'll open a feature request for Cert Spotter if this does become a real problem in the future.

Thanks @mingjunyang and @AGWA for cooperating on this issue with me.

AGWA commented

With this you mean that the performance of the machine Cert Spotter is running on as well as the network connection currently limit the amount of requests?

Correct.

If you start seeing sustained rate limit errors again, please do open another issue. I have a feeling though that this issue was caused by something on TrustAsia's end so there's nothing Cert Spotter could have done to prevent it.

We're six months later now. I encountered a few (<10) TrustAsia rate limit errors in the meantime.
Since yesterday however I'm seeing many 429 Too Many Requests messages from https://yeti2022-2.ct.digicert.com/log/ (times are in CEST):

2022/05/01 07:17:18 https://yeti2022-2.ct.digicert.com/log/: Problem fetching entries 479921031 to 479921262 from log: GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=479921031&end=479921262: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)
2022/05/01 07:17:18 https://yeti2022-2.ct.digicert.com/log/: Error scanning log (if this error persists, it should be construed as misbehavior by the log): GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=479921031&end=479921262: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)

We run Cert Spotter once every hour and this message was logged almost every hour.

I'd like to turn this bug report into a feature request. It would be great if the number of requests that Cert Spotter does per second can be limited so that excessive requests can be prevented.

DigiCert's nessie2022 log also applies rate limiting:

2022/05/05 11:18:29 https://nessie2022.ct.digicert.com/log/: Problem fetching entries 140658797 to 140659796 from log: GET https://nessie2022.ct.digicert.com/log/ct/v1/get-entries?start=140658797&end=140659796: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)

Experiencing the same.
Certspotter also seems to hang at this part.

@certrik yeah I now often get the Another instance of /usr/local/bin/certspotter is already running error since these rate limit issues started appearing. I think I'm going to do what @wmfgerrit did: assemble a local CT log list file that excludes the rate limiting logs. Not the ideal solution but I think it's better than what I have now.

@AGWA I would consider this a bug.

AGWA commented

A fix is forthcoming.

AGWA commented

@certrik @imrejonk Thanks for your patience as I found time to investigate this issue.

certspotter makes two types of requests to logs:

  • get-sth
  • get-entries

If a get-entries request fails, certspotter retries the request up to 10 times with exponential backoff. certspotter will wait a full 8.5 minutes before the final retry, so as long as a log allows one request every 8.5 minutes, certspotter is resilient to get-entries rate limiting.

get-sth does not currently have any retry logic. This is an oversight and will be fixed. Once fixed, there should be no more problems monitoring TrustAsia's log.

However, the error messages in #49 (comment) and #49 (comment) show rate limit errors from get-entries, not get-sth. I find this very surprising, because certspotter would have waited 8.5 minutes before sending those get-entries requests. If the log is really allowing fewer than 1 request every 8.5 minutes, then the next step would be to complain on the ct-policy@ mailing list because that is an absurd rate limit.

However, I'm hesitant to do so because I have not ever observed this problem myself. I've been hitting the DigiCert logs continuously for the past 24 hours trying to reproduce this error and have not been able to do so.

So I'm going to need as much information as possible about your certspotter setups. How are you invoking certspotter? Are you running parallel instances of certspotter, either intentionally or inadvertently?

@AGWA no no no, thank you for taking the time to look into this issue! It is much appreciated :)

I also think it is a good idea to further investigate this before sending a complain to the ct-policy list. I hope this description of my Cert Spotter setup is useful:

Cert Spotter version 0.11, compiled with Go 1.15, is the only application that I'm running on a Debian 11 VM at DigitalOcean. The VM has access to one shared 2.20 GHz CPU, 1 GB of RAM, 25 GB of SSD-backed storage and a gigabit network interface. I'm quite certain that these specs are sufficient:

  • I have alerts set up for out-of-memory situations but so far none have occurred.
  • There is no or negligible CPU steal occurring while Cert Spotter runs.
  • Cert Spotter consumes max 22% of the CPU's time while it is running, but often far less.
  • Cert Spotter consumes about 12% of RAM.
  • Download bandwidth doesn't exceed 50 Mbit/s and is often a lot less.
  • Upload bandwidth doesn't exceed 400 kbit/s.
  • Disk usage is currently 23%.
  • The CPU does not seem to have any trouble accessing the SSD storage while Cert Spotter is running, as evident by top's 0,0 wa output.

Cert Spotter gets started from this cron job:

SHELL="/bin/bash"
#Ansible: Autogenerated name for certspotter
@hourly certspotter /usr/local/bin/certspotter -logs /etc/certspotter/monitor.json -watchlist /etc/certspotter/watchlist | mail -E -s "[CipherSOC] New certificates found in CT logs" root@ciphermail.com

monitor.json basically contains https://loglist.certspotter.org/monitor.json, minus the DigiCert Yeti 2022 #2 log that caused so much rate limiting error output. I have just reverted this to help debugging things, I am now using your monitor.json again (I now omit the -logs parameter again).

My watchlist contains:

.ciphermail.com
.ciphermail-demo.com
.djigzo.com

As you can see, I run Cert Spotter from its own 'certspotter' user account. The home directory of this account is /var/local/certspotter, which makes Cert Spotter store its persistent data in /var/local/certspotter/.certspotter. The lockfile placed in that directory prevents multiple instances from running at the same time. I don't run parallel instances. If for some reason Cert Spotter takes more than an hour to complete, the next cron job fails with a message like this one: Another instance of /usr/local/bin/certspotter is already running (as process ID 176881); remove the file /var/local/certspotter/.certspotter/lock if this is not the case

According to /var/log/cron.log, the jobs start on the hour (I should probably change that to a random minute), and run for about half an hour:

Jun  1 00:00:01 certspotter CRON[178965]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun  1 00:26:12 certspotter CRON[178965]: pam_unix(cron:session): session closed for user certspotter
Jun  1 01:00:01 certspotter CRON[179586]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun  1 01:35:08 certspotter CRON[179586]: pam_unix(cron:session): session closed for user certspotter
Jun  1 02:00:01 certspotter CRON[179768]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun  1 02:27:10 certspotter CRON[179768]: pam_unix(cron:session): session closed for user certspotter
Jun  1 03:00:01 certspotter CRON[180107]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun  1 03:34:17 certspotter CRON[180107]: pam_unix(cron:session): session closed for user certspotter
Jun  1 04:00:01 certspotter CRON[180292]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun  1 04:32:07 certspotter CRON[180292]: pam_unix(cron:session): session closed for user certspotter

If there's anything else I can do to help you debug this, don't hesitate to ask.

AGWA commented

You're very welcome @imrejonk and thanks for the extremely detailed description of your deployment. It sounds very robust and in particular it seems unlikely that certspotter could have been running in parallel accidentally. Have you seen any rate limit errors from DigiCert in the last few days?

I'll be pushing changes in the next few days to add retry logic to get-sth requests, but ultimately certspotter needs to be reworked to be a daemon rather than a cron job. The fact is that sometimes CT logs have transient problems, and a CT monitor needs to continuously retry and only bother the user with an error message if the problem persists for longer than the log's MMD (24 hours for all current logs). As a cron job it can't do that because it would block the monitoring of all other logs. A daemon would allow much greater flexibility (in addition to other benefits). I've opened #63 to track this.

Have you seen any rate limit errors from DigiCert in the last few days?

Well I expected there to be at least some 429's in Cert Spotter's output since I added DigiCert Yeti 2022 #2 to monitor.json again last Thursday, but there weren't any. Instead there were these errors occurring almost every hour, indicating that the Cert Spotter process wasn't keeping up:

/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 188090); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 215159); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 225997); remove the file /var/local/certspotter/.certspotter/lock if this is not the case

The errors from process 188090 occurred from Thu, 02 Jun 2022 18:00:01 +0200 to Sun, 05 Jun 2022 03:00:02 +0200. For process 215159, Sun, 05 Jun 2022 05:00:02 +0200 until Tue, 07 Jun 2022 04:00:01 +0200. And for process 225997, Tue, 07 Jun 2022 06:00:02 +0200 until I killed it at Tue, 07 Jun 2022 09:33:28 +0200.

There was no output except for the locking errors above, but when I killed process 225997, I got some output for new certificates that were issued over the weekend:

7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423:
	     DNS Name = ftp.ciphermail.com
	       Pubkey = c3f89613fe6ce2801a878ce35141e40b4c059115f09e2413d97143cdba503731
	       Issuer = C=US, O=Let's Encrypt, CN=R3
	   Not Before = 2022-06-05 16:49:37 +0000 UTC
	    Not After = 2022-09-03 16:49:36 +0000 UTC
	    Log Entry = 348805469 @ https://oak.ct.letsencrypt.org/2022/ (Pre-certificate)
	       crt.sh = https://crt.sh/?sha256=7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423
	     Filename = /var/local/certspotter/.certspotter/certs/7b/7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423.precert.pem

e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0:
	     DNS Name = support2.ciphermail.com
	     DNS Name = support.ciphermail.com
	       Pubkey = 36f0126cdcc4e2aff924d0d2d6cac7581178db9c8139dab15b26c24dc3d9810b
	       Issuer = C=US, O=Let's Encrypt, CN=R3
	   Not Before = 2022-06-05 21:17:50 +0000 UTC
	    Not After = 2022-09-03 21:17:49 +0000 UTC
	    Log Entry = 349003014 @ https://oak.ct.letsencrypt.org/2022/ (Pre-certificate)
	       crt.sh = https://crt.sh/?sha256=e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0
	     Filename = /var/local/certspotter/.certspotter/certs/e3/e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0.precert.pem

a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b:
	     DNS Name = lists2.ciphermail.com
	     DNS Name = lists.ciphermail.com
	     DNS Name = lists2-rspamd.ciphermail.com
	     DNS Name = rspamd.lists.ciphermail.com
	       Pubkey = 5402fd366dfe9992a9bcff48115d4b705ecd66f7f8dba313fb2f8ce911b4faf9
	       Issuer = C=US, O=Let's Encrypt, CN=R3
	   Not Before = 2022-06-06 00:14:40 +0000 UTC
	    Not After = 2022-09-04 00:14:39 +0000 UTC
	    Log Entry = 571029709 @ https://ct.cloudflare.com/logs/nimbus2022/ (Pre-certificate)
	       crt.sh = https://crt.sh/?sha256=a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b
	     Filename = /var/local/certspotter/.certspotter/certs/a1/a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b.precert.pem

543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369:
	     DNS Name = lists2.ciphermail.com
	       Pubkey = bd25a0406365017814e71b66e26cfcf2f8f4757fd331784668aa797cf45fe96e
	       Issuer = C=US, O=Let's Encrypt, CN=R3
	   Not Before = 2022-06-06 00:15:58 +0000 UTC
	    Not After = 2022-09-04 00:15:57 +0000 UTC
	    Log Entry = 571031507 @ https://ct.cloudflare.com/logs/nimbus2022/ (Pre-certificate)
	       crt.sh = https://crt.sh/?sha256=543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369
	     Filename = /var/local/certspotter/.certspotter/certs/54/543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369.precert.pem

3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0:
	     DNS Name = ftp.ciphermail.com
	       Pubkey = c3f89613fe6ce2801a878ce35141e40b4c059115f09e2413d97143cdba503731
	       Issuer = C=US, O=Let's Encrypt, CN=R3
	   Not Before = 2022-06-05 16:49:37 +0000 UTC
	    Not After = 2022-09-03 16:49:36 +0000 UTC
	    Log Entry = 1309685022 @ https://ct.googleapis.com/logs/xenon2022/ (Certificate)
	       crt.sh = https://crt.sh/?sha256=3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0
	     Filename = /var/local/certspotter/.certspotter/certs/33/3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0.cert.pem

I would have expected those certificate entries to be output by process 215159 instead of 225997. But maybe this is expected as there is some delay between certificate issuance and log incorporation. Also notice that the output contains four precertificates, but only one "real" certificate. It looks like I killed Cert Spotter before it had time to process the other certificates, assuming these certificates had been incorporated in the CT logs together with the precertificates.

I removed the DigiCert Yeti 2022 #2 log again this morning, my Cert Spotter installation is just unusable when that log is present in monitor.json.

AGWA commented

Since you haven't monitored Yeti 2022 #2 for so long, certspotter has fallen behind and is trying to catch up by downloading all entries that were added to the log since you stopped monitoring, which will take a long time. Unfortunately, this currently blocks the monitoring of other logs, which would be fixed if certspotter were a daemon (point number 2 in #63).

If you want to skip ahead to the end of this log, then you can do the following:

  1. Remove the /var/local/certspotter/.certspotter/logs/BZwB0yDgB4QTlYBJjRF8kDJmr69yULWvO0akPhGEDUo directory
  2. Add the -start_at_end option to the certspotter command line

It's promising that you did not observe any 429 errors.

AGWA commented

I've released version 0.12 which adds retries to get-sth requests which will fix the original issue (rate limit errors with TrustAsia logs).

I'm assuming the DigiCert get-entries rate limit errors were transient. If they start happening again despite certspotter's retry logic, we can complain on ct-policy@.

certspotter will become more resilient to transient log errors once #63 is completed.

Since you haven't monitored Yeti 2022 #2 for so long, certspotter has fallen behind and is trying to catch up by downloading all entries that were added to the log since you stopped monitoring, which will take a long time. Unfortunately, this currently blocks the monitoring of other logs, which would be fixed if certspotter were a daemon (point number 2 in #63).

That makes sense, thanks for pointing this out.

If you want to skip ahead to the end of this log, then you can do the following:

1. Remove the `/var/local/certspotter/.certspotter/logs/BZwB0yDgB4QTlYBJjRF8kDJmr69yULWvO0akPhGEDUo` directory

2. Add the `-start_at_end` option to the `certspotter` command line

Done! I also updated to Cert Spotter 0.12. I am using your monitor.json file again and I haven't had any errors since.

Cert Spotter 0.12 didn't compile with Go 1.17 (even though the README suggests compatibility with Go 1.5 and up) so I had to install the golang-1.18 package from Debian's bullseye-backports and compile Cert Spotter like so:

cd cmd/certspotter
/usr/lib/go-1.18/bin/go get
/usr/lib/go-1.18/bin/go build
AGWA commented

Thanks for pointing out the issue with the Go version. I've reduced the minimum Go version to 1.17 and updated the README.

You're right, it compiles with Go 1.17 just fine. Go 1.16 also works by the way. I tried compiling with 1.15 earlier, that didn't work:

go: downloading golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4
go: downloading golang.org/x/text v0.3.7
# software.sslmate.com/src/certspotter/ct/client
../../ct/client/logclient.go:193:24: undefined: io.ReadAll
note: module requires Go 1.18

So although the error message claims that some module requires Go 1.18, compiling with 1.16 or 1.17 works as well.

DigiCert's yeti2022-2 log started throwing 429 Too Many Requests errors again since yesterday 2022-07-20 23:12:07+02:00:

2022/07/20 23:12:07 https://yeti2022-2.ct.digicert.com/log/: Error scanning log (if this error persists, it should be construed as misbehavior by the log): GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=682846695&end=682846926: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)

My Cert Spotter (version 0.12) runs take around 30 minutes to complete. I run them once every hour, on the hour. I've hit the rate limit at 23:12, 01:22, 03:28 and 11:18 (UTC+2) so far.

@AGWA do you think I should report this to the ct-policy mailing list?

AGWA commented

@imrejonk I also observed 429 errors from Yeti 2022-2 at around the same timeframe. I'm drafting an email to ct-policy that will reference this as well as some other observations about this log.