aaronpowell/httpstatus

httpstat.us service down?

Closed this issue ยท 25 comments

every endpoint (including /404, /200, and /) yields 503 (observed between 7:56 and 8:46 am EDT)

This is down for us as well and we're using it in our test suite for https://frontity.org, so I've created a dead simple app on glitch.me that we use instead. Works just fine ๐Ÿ‘Œ

Anybody having the same issue, you're welcome to use it too: https://ballistic-western-donkey.glitch.me ๐Ÿ™‚

Hey folks ๐Ÿ‘‹

Thanks for reporting this.
I just tried a few endpoints, https://httpstat.us/404, https://httpstat.us/200, and they both worked.

Could you please confirm it's also working on your end?

Cheers

Hey, @mderriey ๐Ÿ‘‹

It does seem to work fine now ๐Ÿ‘

Thanks for confirming.

I'm closing this for now, feel free to reopen if it happens again in the near future.

Hi, @mderriey

I'm afraid it seems to be down again. Right now, 11:25 am CET, it returns 503 or 524 (a Cloudflare timeout error).

Thanks for letting us know, we'll put that on our todo list.

Just an update, it seems that the service is getting absolutely slammed at the moment, averaging 3,000 requests per minute (peaking close to 7,500).

As a result, it's just not capable of meeting the demand, so we're trying to get to the bottom of that.

Rate limiting (say, 60-100 requests per minute from a single IP?) would help in kicking out noisy clients. Dis you consider something like this?

Hello! We use the site to validate some tests and our builds are failing due to the site being down. Going to the site times out.

Is the site still getting slammed with requests?

Rate limiting (say, 60-100 requests per minute from a single IP?) would help in kicking out noisy clients. Dis you consider something like this?

We did, yes, however the tech stack as it stands is dated and we don't want to invest too much effort.
We're thinking about migrating to ASP.NET Core (see #91), which would make it much easier to introduce rate limiting.

Hello! We use the site to validate some tests and our builds are failing due to the site being down. Going to the site times out.

Is the site still getting slammed with requests?

I personally don't have access to the telemetry service. @aaronpowell?

Short answer - yep!

I've tweaked some of the logging metrics to try and get a better handle on the sources to see if we can do some better limits.

It looks like we're back up and running, so to speak.

TL;DR - in the last 24 hours we received 5.5m requests, of which I class 4.1m as abuse requests, and a temporary workaround is in place.

What happened?

At some point in the past, httpstat.us started receiving a lot of requests for one status code, a little over 52m requests in fact, and they appear to be setting the max-timeout for a request (10s).

Because of the way the timeout works, it deadlocks the thread until it expires, meaning there's only so many concurrent timeout requests that can be handled. Yeah, it's not the best solution, but as @mderriey said, it's a really dated tech stack and never designed to support that many requests.

Today, I pushed a small hack to basically noop the request to /504, which is the endpoint that is being abused, and as such, the service appears to be back to stable.

We'll keep monitoring for a few days to see if the abuse shifts, but at the very lease, it's holding together ok. ๐Ÿ˜‚

Appologies to anyone who legitimately needs to hit /504, but for the moment, it's going to be offline.

Looking at where we stand today, it's looking a lot better. Since the workaround went it, we're seeing an average response time where I expect it to be (sub 1s), rather than close to 10s when it was being abused.

It still looks like there's a lot of requests that I class as abusive requests, but they are no longer impacting the service.

I'll keep monitoring throughout next week and provide updates here.

@aaronpowell our test case is using http://httpstat.us/504?sleep=3000 but now it doesn't work as expected
is this expected result according to your changes from Oct 9?

Correct @helloint. At the moment we flat-out reject every request for a 504 response.

well looks like it won't be fixed in short term. we will replace with http://httpstat.us/500?sleep=3000 then.
thanks.

It looks like the service is down again, any idea what is causing it?

Oh no worries, enjoy your holiday! It seems up again so I'll quickly restart our CI jobs :)

Thanks for your work!

Service down as of now (June 9, 2021 at 13:30 CET). Returns 503 Service Unavailable.

Also seeing that the service is down

Looks like it righted itself. FWIW - these notifications came in at 9.30pm and 1.30am, neither of which times I was awake ๐Ÿคฃ

All good! I'm seeing it's working too. Cheers

I'm finally closing this. I've done some work on the Azure infra, it's been ported off .NET Framework to .NET 6 which sees better performance. I've also changed the SSL handling to be Azure directly, so we shouldn't see the certificate expire.

I'm still observing some abuse around the sleep and 504 requests, so they'll be left as a no-op for the time being.

Hi there, first of all, this service is really cool - thanks for offering it! I did want to let you know that I'm currently seeing that the site is (mostly?) down:

$ curl -v https://httpstat.us/200
*   Trying 20.40.202.3:443...
* Connected to httpstat.us (20.40.202.3) port 443 (#0)
* ALPN: offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=httpstat.us
*  start date: Nov  7 00:00:00 2023 GMT
*  expire date: May  7 23:59:59 2024 GMT
*  subjectAltName: host "httpstat.us" matched cert's "httpstat.us"
*  issuer: C=US; O=DigiCert, Inc.; CN=GeoTrust Global TLS RSA4096 SHA256 2022 CA1
*  SSL certificate verify ok.
* using HTTP/1.1
> GET /200 HTTP/1.1
> Host: httpstat.us
> User-Agent: curl/7.88.1
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
< HTTP/1.1 503 Service Unavailable
< Content-Length: 0
< Date: Tue, 12 Dec 2023 18:45:31 GMT
< Set-Cookie: ARRAffinity=2f58bd0c4abeddfa81d72684f97e0104b7cea9d1afb9a5c44c71bf44646c500b;Path=/;HttpOnly;Secure;Domain=httpstat.us
< Set-Cookie: ARRAffinitySameSite=2f58bd0c4abeddfa81d72684f97e0104b7cea9d1afb9a5c44c71bf44646c500b;Path=/;HttpOnly;SameSite=None;Secure;Domain=httpstat.us
<
* Connection #0 to host httpstat.us left intact