429 Too Many Requests from TrustAsia and DigiCert logs
imrejonk opened this issue · 24 comments
Since Monday around 16:00 CET, my Cert Spotter installation has been outputting this on every run:
2021/11/08 16:00:04 https://ct.trustasia.com/log2022/: Error retrieving STH from log: GET https://ct.trustasia.com/log2022/ct/v1/get-sth: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>openresty</center>
</body>
</html>
)
2021/11/08 16:00:04 https://ct.trustasia.com/log2023/: Error retrieving STH from log: GET https://ct.trustasia.com/log2023/ct/v1/get-sth: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>openresty</center>
</body>
</html>
)
I know that Cert Spotter downloads a list of CT logs to check from certspotter.org before every run, so I guess that something in that list was recently changed. Is this expected behaviour and is TrustAsia's WAF a bit trigger happy or is Cert Spotter incorrectly sending lots of meaningless requests for the empty 2022/2023 TrustAsia logs?
There haven't been any changes to loglist.certspotter.org since September 14. certspotter sends one get-sth request to each log per invocation. I haven't observed any problems with the TrustAsia logs, so it seems they are blocking you for some reason. I recommend getting in touch with the TrustAsia log operators (trustasia-ct-logs@trustasia.com). If you don't get a satisfactory response from them, report back and we can raise this to browsers as a potential policy violation.
Thanks for the swift reply, I've contacted the TrustAsia log operators and will update this issue once I get a response.
Hello @imrejonk I'm the TrustAsia log operators, we were got you email.
And our team will discuss about you issue.
We announce the Rate Limited : 8QPS , please see the issues https://bugs.chromium.org/p/chromium/issues/detail?id=1162248 .
Our server monitor to raise the alarm at (2021-11-11 12:50:00 +0800) and (2021-11-12 12:18:00 +0800) . maybe is your requests.
And we will discuss with you later.
Hi @mingjunyang, thanks for the reply. I understand that the TrustAsia CT logs are rate limiting requests to eight queries per second (8QPS) per IP address. My Cert Spotter instance must have exceeded that limit.
Am I correct in assuming that the 8QPS limit applies to each one of TrustAsia's logs individually? Can I perform 8QPS to log2021, log2022 and log2023 at the same time (24QPS in total) without being rate limited?
I think that the root problem here is either:
- Cert Spotter does not limit itself to eight queries per second per log, or
- eight queries per second is not enough and TrustAsia should raise it.
I'm curious what @AGWA's opinion is on this.
I should note that I haven't seen the 429 Too Many Requests error anymore since Thu, 11 Nov 2021 05:00:03 +0100
.
Assuming 8QPS is per log, 8QPS is enough to keep up with the current growth rate of logs. Note that Cert Spotter doesn't explicitly limit the rate of queries sent to logs. Ideally it would, but since it doesn't send queries to the same log in parallel, queries are implicitly rate limited and log rate limits haven't been a problem in practice.
queries are implicitly rate limited
With this you mean that the performance of the machine Cert Spotter is running on as well as the network connection currently limit the amount of requests?
log rate limits haven't been a problem in practice
Well they were for me, at least for a short while. I haven't encountered TrustAsia's rate limit since last Thursday so I guess the practical implications are currently low if not zero. Guess I'll close this issue for now. I'll open a feature request for Cert Spotter if this does become a real problem in the future.
Thanks @mingjunyang and @AGWA for cooperating on this issue with me.
With this you mean that the performance of the machine Cert Spotter is running on as well as the network connection currently limit the amount of requests?
Correct.
If you start seeing sustained rate limit errors again, please do open another issue. I have a feeling though that this issue was caused by something on TrustAsia's end so there's nothing Cert Spotter could have done to prevent it.
We're six months later now. I encountered a few (<10) TrustAsia rate limit errors in the meantime.
Since yesterday however I'm seeing many 429 Too Many Requests
messages from https://yeti2022-2.ct.digicert.com/log/ (times are in CEST):
2022/05/01 07:17:18 https://yeti2022-2.ct.digicert.com/log/: Problem fetching entries 479921031 to 479921262 from log: GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=479921031&end=479921262: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)
2022/05/01 07:17:18 https://yeti2022-2.ct.digicert.com/log/: Error scanning log (if this error persists, it should be construed as misbehavior by the log): GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=479921031&end=479921262: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)
We run Cert Spotter once every hour and this message was logged almost every hour.
I'd like to turn this bug report into a feature request. It would be great if the number of requests that Cert Spotter does per second can be limited so that excessive requests can be prevented.
DigiCert's nessie2022 log also applies rate limiting:
2022/05/05 11:18:29 https://nessie2022.ct.digicert.com/log/: Problem fetching entries 140658797 to 140659796 from log: GET https://nessie2022.ct.digicert.com/log/ct/v1/get-entries?start=140658797&end=140659796: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)
Experiencing the same.
Certspotter also seems to hang at this part.
@certrik yeah I now often get the Another instance of /usr/local/bin/certspotter is already running
error since these rate limit issues started appearing. I think I'm going to do what @wmfgerrit did: assemble a local CT log list file that excludes the rate limiting logs. Not the ideal solution but I think it's better than what I have now.
A fix is forthcoming.
@certrik @imrejonk Thanks for your patience as I found time to investigate this issue.
certspotter makes two types of requests to logs:
get-sth
get-entries
If a get-entries
request fails, certspotter retries the request up to 10 times with exponential backoff. certspotter will wait a full 8.5 minutes before the final retry, so as long as a log allows one request every 8.5 minutes, certspotter is resilient to get-entries
rate limiting.
get-sth
does not currently have any retry logic. This is an oversight and will be fixed. Once fixed, there should be no more problems monitoring TrustAsia's log.
However, the error messages in #49 (comment) and #49 (comment) show rate limit errors from get-entries
, not get-sth
. I find this very surprising, because certspotter would have waited 8.5 minutes before sending those get-entries
requests. If the log is really allowing fewer than 1 request every 8.5 minutes, then the next step would be to complain on the ct-policy@
mailing list because that is an absurd rate limit.
However, I'm hesitant to do so because I have not ever observed this problem myself. I've been hitting the DigiCert logs continuously for the past 24 hours trying to reproduce this error and have not been able to do so.
So I'm going to need as much information as possible about your certspotter setups. How are you invoking certspotter? Are you running parallel instances of certspotter, either intentionally or inadvertently?
@AGWA no no no, thank you for taking the time to look into this issue! It is much appreciated :)
I also think it is a good idea to further investigate this before sending a complain to the ct-policy
list. I hope this description of my Cert Spotter setup is useful:
Cert Spotter version 0.11, compiled with Go 1.15, is the only application that I'm running on a Debian 11 VM at DigitalOcean. The VM has access to one shared 2.20 GHz CPU, 1 GB of RAM, 25 GB of SSD-backed storage and a gigabit network interface. I'm quite certain that these specs are sufficient:
- I have alerts set up for out-of-memory situations but so far none have occurred.
- There is no or negligible CPU steal occurring while Cert Spotter runs.
- Cert Spotter consumes max 22% of the CPU's time while it is running, but often far less.
- Cert Spotter consumes about 12% of RAM.
- Download bandwidth doesn't exceed 50 Mbit/s and is often a lot less.
- Upload bandwidth doesn't exceed 400 kbit/s.
- Disk usage is currently 23%.
- The CPU does not seem to have any trouble accessing the SSD storage while Cert Spotter is running, as evident by
top
's0,0 wa
output.
Cert Spotter gets started from this cron job:
SHELL="/bin/bash"
#Ansible: Autogenerated name for certspotter
@hourly certspotter /usr/local/bin/certspotter -logs /etc/certspotter/monitor.json -watchlist /etc/certspotter/watchlist | mail -E -s "[CipherSOC] New certificates found in CT logs" root@ciphermail.com
monitor.json
basically contains https://loglist.certspotter.org/monitor.json, minus the DigiCert Yeti 2022 #2
log that caused so much rate limiting error output. I have just reverted this to help debugging things, I am now using your monitor.json
again (I now omit the -logs parameter again).
My watchlist contains:
.ciphermail.com
.ciphermail-demo.com
.djigzo.com
As you can see, I run Cert Spotter from its own 'certspotter' user account. The home directory of this account is /var/local/certspotter
, which makes Cert Spotter store its persistent data in /var/local/certspotter/.certspotter
. The lockfile placed in that directory prevents multiple instances from running at the same time. I don't run parallel instances. If for some reason Cert Spotter takes more than an hour to complete, the next cron job fails with a message like this one: Another instance of /usr/local/bin/certspotter is already running (as process ID 176881); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
According to /var/log/cron.log
, the jobs start on the hour (I should probably change that to a random minute), and run for about half an hour:
Jun 1 00:00:01 certspotter CRON[178965]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun 1 00:26:12 certspotter CRON[178965]: pam_unix(cron:session): session closed for user certspotter
Jun 1 01:00:01 certspotter CRON[179586]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun 1 01:35:08 certspotter CRON[179586]: pam_unix(cron:session): session closed for user certspotter
Jun 1 02:00:01 certspotter CRON[179768]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun 1 02:27:10 certspotter CRON[179768]: pam_unix(cron:session): session closed for user certspotter
Jun 1 03:00:01 certspotter CRON[180107]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun 1 03:34:17 certspotter CRON[180107]: pam_unix(cron:session): session closed for user certspotter
Jun 1 04:00:01 certspotter CRON[180292]: pam_unix(cron:session): session opened for user certspotter(uid=11) by (uid=0)
Jun 1 04:32:07 certspotter CRON[180292]: pam_unix(cron:session): session closed for user certspotter
If there's anything else I can do to help you debug this, don't hesitate to ask.
You're very welcome @imrejonk and thanks for the extremely detailed description of your deployment. It sounds very robust and in particular it seems unlikely that certspotter could have been running in parallel accidentally. Have you seen any rate limit errors from DigiCert in the last few days?
I'll be pushing changes in the next few days to add retry logic to get-sth
requests, but ultimately certspotter needs to be reworked to be a daemon rather than a cron job. The fact is that sometimes CT logs have transient problems, and a CT monitor needs to continuously retry and only bother the user with an error message if the problem persists for longer than the log's MMD (24 hours for all current logs). As a cron job it can't do that because it would block the monitoring of all other logs. A daemon would allow much greater flexibility (in addition to other benefits). I've opened #63 to track this.
Have you seen any rate limit errors from DigiCert in the last few days?
Well I expected there to be at least some 429's in Cert Spotter's output since I added DigiCert Yeti 2022 #2
to monitor.json
again last Thursday, but there weren't any. Instead there were these errors occurring almost every hour, indicating that the Cert Spotter process wasn't keeping up:
/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 188090); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 215159); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
/usr/local/bin/certspotter: Another instance of /usr/local/bin/certspotter is already running (as process ID 225997); remove the file /var/local/certspotter/.certspotter/lock if this is not the case
The errors from process 188090 occurred from Thu, 02 Jun 2022 18:00:01 +0200
to Sun, 05 Jun 2022 03:00:02 +0200
. For process 215159, Sun, 05 Jun 2022 05:00:02 +0200
until Tue, 07 Jun 2022 04:00:01 +0200
. And for process 225997, Tue, 07 Jun 2022 06:00:02 +0200
until I killed it at Tue, 07 Jun 2022 09:33:28 +0200
.
There was no output except for the locking errors above, but when I killed process 225997, I got some output for new certificates that were issued over the weekend:
7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423:
DNS Name = ftp.ciphermail.com
Pubkey = c3f89613fe6ce2801a878ce35141e40b4c059115f09e2413d97143cdba503731
Issuer = C=US, O=Let's Encrypt, CN=R3
Not Before = 2022-06-05 16:49:37 +0000 UTC
Not After = 2022-09-03 16:49:36 +0000 UTC
Log Entry = 348805469 @ https://oak.ct.letsencrypt.org/2022/ (Pre-certificate)
crt.sh = https://crt.sh/?sha256=7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423
Filename = /var/local/certspotter/.certspotter/certs/7b/7b1addc773716f2929f2f641dcdab505a15e229e3ba4cea01d9f5147a1afd423.precert.pem
e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0:
DNS Name = support2.ciphermail.com
DNS Name = support.ciphermail.com
Pubkey = 36f0126cdcc4e2aff924d0d2d6cac7581178db9c8139dab15b26c24dc3d9810b
Issuer = C=US, O=Let's Encrypt, CN=R3
Not Before = 2022-06-05 21:17:50 +0000 UTC
Not After = 2022-09-03 21:17:49 +0000 UTC
Log Entry = 349003014 @ https://oak.ct.letsencrypt.org/2022/ (Pre-certificate)
crt.sh = https://crt.sh/?sha256=e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0
Filename = /var/local/certspotter/.certspotter/certs/e3/e3cf9e97873a125a849b4598566d90118de465f586578967e5d18692da9e91c0.precert.pem
a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b:
DNS Name = lists2.ciphermail.com
DNS Name = lists.ciphermail.com
DNS Name = lists2-rspamd.ciphermail.com
DNS Name = rspamd.lists.ciphermail.com
Pubkey = 5402fd366dfe9992a9bcff48115d4b705ecd66f7f8dba313fb2f8ce911b4faf9
Issuer = C=US, O=Let's Encrypt, CN=R3
Not Before = 2022-06-06 00:14:40 +0000 UTC
Not After = 2022-09-04 00:14:39 +0000 UTC
Log Entry = 571029709 @ https://ct.cloudflare.com/logs/nimbus2022/ (Pre-certificate)
crt.sh = https://crt.sh/?sha256=a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b
Filename = /var/local/certspotter/.certspotter/certs/a1/a1fca05831d3a004b95fdffafe1a9efc35e61079fbc3987e9d0865383d27e90b.precert.pem
543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369:
DNS Name = lists2.ciphermail.com
Pubkey = bd25a0406365017814e71b66e26cfcf2f8f4757fd331784668aa797cf45fe96e
Issuer = C=US, O=Let's Encrypt, CN=R3
Not Before = 2022-06-06 00:15:58 +0000 UTC
Not After = 2022-09-04 00:15:57 +0000 UTC
Log Entry = 571031507 @ https://ct.cloudflare.com/logs/nimbus2022/ (Pre-certificate)
crt.sh = https://crt.sh/?sha256=543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369
Filename = /var/local/certspotter/.certspotter/certs/54/543e7507e3e4e929f7b1ac7e7b221ae3002201ec86c6fbf38ff4606d8f326369.precert.pem
3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0:
DNS Name = ftp.ciphermail.com
Pubkey = c3f89613fe6ce2801a878ce35141e40b4c059115f09e2413d97143cdba503731
Issuer = C=US, O=Let's Encrypt, CN=R3
Not Before = 2022-06-05 16:49:37 +0000 UTC
Not After = 2022-09-03 16:49:36 +0000 UTC
Log Entry = 1309685022 @ https://ct.googleapis.com/logs/xenon2022/ (Certificate)
crt.sh = https://crt.sh/?sha256=3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0
Filename = /var/local/certspotter/.certspotter/certs/33/3388a0eb99b58c9b142f6f781b6832eec650b00d7e3fd42168dd4e1ea357d0d0.cert.pem
I would have expected those certificate entries to be output by process 215159 instead of 225997. But maybe this is expected as there is some delay between certificate issuance and log incorporation. Also notice that the output contains four precertificates, but only one "real" certificate. It looks like I killed Cert Spotter before it had time to process the other certificates, assuming these certificates had been incorporated in the CT logs together with the precertificates.
I removed the DigiCert Yeti 2022 #2
log again this morning, my Cert Spotter installation is just unusable when that log is present in monitor.json
.
Since you haven't monitored Yeti 2022 #2
for so long, certspotter has fallen behind and is trying to catch up by downloading all entries that were added to the log since you stopped monitoring, which will take a long time. Unfortunately, this currently blocks the monitoring of other logs, which would be fixed if certspotter were a daemon (point number 2 in #63).
If you want to skip ahead to the end of this log, then you can do the following:
- Remove the
/var/local/certspotter/.certspotter/logs/BZwB0yDgB4QTlYBJjRF8kDJmr69yULWvO0akPhGEDUo
directory - Add the
-start_at_end
option to thecertspotter
command line
It's promising that you did not observe any 429 errors.
I've released version 0.12 which adds retries to get-sth
requests which will fix the original issue (rate limit errors with TrustAsia logs).
I'm assuming the DigiCert get-entries
rate limit errors were transient. If they start happening again despite certspotter's retry logic, we can complain on ct-policy@
.
certspotter will become more resilient to transient log errors once #63 is completed.
Since you haven't monitored
Yeti 2022 #2
for so long, certspotter has fallen behind and is trying to catch up by downloading all entries that were added to the log since you stopped monitoring, which will take a long time. Unfortunately, this currently blocks the monitoring of other logs, which would be fixed if certspotter were a daemon (point number 2 in #63).
That makes sense, thanks for pointing this out.
If you want to skip ahead to the end of this log, then you can do the following:
1. Remove the `/var/local/certspotter/.certspotter/logs/BZwB0yDgB4QTlYBJjRF8kDJmr69yULWvO0akPhGEDUo` directory 2. Add the `-start_at_end` option to the `certspotter` command line
Done! I also updated to Cert Spotter 0.12. I am using your monitor.json
file again and I haven't had any errors since.
Cert Spotter 0.12 didn't compile with Go 1.17 (even though the README suggests compatibility with Go 1.5 and up) so I had to install the golang-1.18
package from Debian's bullseye-backports and compile Cert Spotter like so:
cd cmd/certspotter
/usr/lib/go-1.18/bin/go get
/usr/lib/go-1.18/bin/go build
Thanks for pointing out the issue with the Go version. I've reduced the minimum Go version to 1.17 and updated the README.
You're right, it compiles with Go 1.17 just fine. Go 1.16 also works by the way. I tried compiling with 1.15 earlier, that didn't work:
go: downloading golang.org/x/net v0.0.0-20220425223048-2871e0cb64e4
go: downloading golang.org/x/text v0.3.7
# software.sslmate.com/src/certspotter/ct/client
../../ct/client/logclient.go:193:24: undefined: io.ReadAll
note: module requires Go 1.18
So although the error message claims that some module requires Go 1.18, compiling with 1.16 or 1.17 works as well.
DigiCert's yeti2022-2 log started throwing 429 Too Many Requests
errors again since yesterday 2022-07-20 23:12:07+02:00:
2022/07/20 23:12:07 https://yeti2022-2.ct.digicert.com/log/: Error scanning log (if this error persists, it should be construed as misbehavior by the log): GET https://yeti2022-2.ct.digicert.com/log/ct/v1/get-entries?start=682846695&end=682846926: 429 Too Many Requests (<html>
<head><title>429 Too Many Requests</title></head>
<body>
<center><h1>429 Too Many Requests</h1></center>
<hr><center>nginx</center>
</body>
</html>
)
My Cert Spotter (version 0.12) runs take around 30 minutes to complete. I run them once every hour, on the hour. I've hit the rate limit at 23:12, 01:22, 03:28 and 11:18 (UTC+2) so far.
@AGWA do you think I should report this to the ct-policy
mailing list?