badges/shields

Images have no-store, must-revalidate which absolutely kills performance

Closed this issue · 5 comments

The responses from img.shields.io have

cache-control: no-cache, no-store, must-revalidate

which is absolutely the worst case for HTTP requests. It means every client has to make a full request, all the way up to your origin server, with no help of the CDN, every. single. time. it displays the badge.

This seems unreasonably restrictive and clearly your origin server is struggling (I'm seeing responses taking over 10 seconds per badge). Even if you'd like the responses to be super fresh and update as soon as the data changes, allowing cache of even just 1 second would allow the CDN to take some traffic off your back-end servers without affecting freshness of the responses:

cache-control: max-age=1

In case the no-store, must-revalidate is not added by your origin server, but the CDN, consider configuring the CDN to allow some caching (e.g. Page Rules can be used for this in Cloudflare).

Thanks for raising this.

We don't use cloudflare as a downstream cache. It has been discussed it in the #maintainers channel but there are some concerns about CloudFlare's use of tracking cookies: #608 and the conversation has somewhat fizzled out. Maybe this is a good opportunity to re-open that conversation though.

Aside from that, modifying the cache-control headers as suggested should give us benefit when using someone else's CDN (e.g: GitHub's camo proxy - a very common case). This seems like a good idea to me, but looking at the code, it does seem to be a very deliberate decision to avoid being cached by camo:

// Cache management - no cache, so it won't be cached by GitHub's CDN.
ask.res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
ask.res.setHeader('Expires', reqTime.toGMTString()); // Proxies, GitHub, see #221.

also see #221

.. but I am not so clear on why this is such a bad thing. Is anyone more familiar with the history? Why this is something we need to avoid?

I assume it was done to avoid outdated badges,
But at the current rate i feel like the badges should at least default to a ~1 hour cache (with option to set ?maxAge=0 for more important badges).

I've been tracking the badges uptime here

(note the uptime is based on the badge having a response time < 4 seconds)

1 second delay:
1 second
2 second delay:
2 seconds
3 second delay:
3 seconds
3.9 second delay:
3 seconds
3.95 second delay:
3 seconds
4 second delay: (always fails)
4 seconds


It would be good if we could get something merged before the next deploy as #1568 is a pretty big issue (with 40 👍 already)

#1725 now merged. There's still a bit of additional work to do in order to deploy this and find the right value for env.BADGE_MAX_AGE_SECONDS.

This was turned on in #1846 which just went live. Will be curious to see how this goes in the morning!

I’m glad to say this has had a huge effect. Today’s peak traffic is being handled like weekend traffic, with 99% of requests coming in underneath the 4 second camo timeout. I’m thrilled!

That should give us a little time to sort out our hosting. We’re still relatively slow on a number of badges, particularly the static badges which should be instant, so we still have a capacity issue to address. Likely this is due to ~10% growth over the last several months. I have proposed moving to Zeit Now to fix the capacity issue and solve our sysadmin bottleneck at the same time. This proposal is blocked awaiting response from @espadrine who owns the servers and load balancer.

Nice work, @chris48s, and everyone else who participated in the discussion and reviews. Thanks also to @kornelski for putting this solution clearly on the radar.

Let’s leave this open for a couple days and as long as things still look good, close it out. Then we can open new issues for any follow-on discussions.