Shopify/slate

Theme deploying unreliable with 500 errors; won't retry failed asset(s)

dgpokl opened this issue · 0 comments

Problem

When running slate-tools deploy, errors are frequently encountered which mean that the asset(s) didn't successfully upload. It appears that the sequence of events doing a deploy is:

  1. For each asset, upload the asset:
    • Capture the output text of that upload command into a buffer but don't display it yet
  2. In the event any assets fail to upload, silently fail and continue, i.e. the success of the asset uploads has does not trigger a retry or an abort.
  3. Print out that buffer from step 1 showing all the failures.
  4. Display a success message (even in the case where failures occurred) like this:
Files overwritten successfully!

✨  Done in 204.89s.

The problem with this approach should be apparent, but I'll spell it out: There are frequent transient unexplained HTTP 500 (or today during the shopify outage, 503) errors that occur with this API. Example of one:

13:05:55 [default]Asset Perform Update to snippets/icon-minus-mobile.liquid at host xxxxxx.myshopify.com
	Status: 500 Internal Server Error
	Errors: Internal Server Error

Simply deploying again will always result in a different result - whether that's complete success or a different set of assets that fail depends on...the weather? Anyway, I can't really don't want to have to build this into a scriptable deploy because I'd have to wait for all several hundred files to be tried, then grep through the STDOUT/STDERR of this deploy command looking for errors, and if even one file failed, I need to deploy the whole thing again (or drop down to themekit and reupload each file myself - but this brings up the question of what is this slate-deploy command even for if one must write a whole program to clean up after it?)

Replication steps

  1. Have a bunch of files, maybe 300? (for all i know though, this may be optional, maybe it can hit anyone)
  2. Try to yarn deploy
  3. It might succeed or the above scenario might happen, listing a random set of assets and HTTP 500 errors for each.

What I would consider reasonable (just my opinion)

If this project weren't in deprecated/limbo status, this is what I would propose

Option A (lazy but decent): Exit with status 1 IMMEDIATELY anytime an asset upload is NOT successful

I could script that something like this:

while true ; do
  yarn deploy
  [[ $? == "0" ]] && break
done

Thus ensuring success on an infinite timescale but still wasting a lot of time reuploading files.

Option B (sane)

While deploying, check to see if file uploads succeed. If they don't succeed,

  • check to see if there's a retry-after header or something and if so, heed it
  • otherwise just delay 1s or something and retry the file a fixed number of times
  • if it exceeds a certain number of retries, exit 1

Speculation

  • Perhaps the 500 errors are something that only affects people with more than a small number of files
  • maybe there's a rate-limiting regime in place and slate is ignorant of what that limit is
  • The 500 error seems to provides no explanation of what's wrong, and/or Slate or themekit is eating the headers it would need to tell it when it can retry and instead just blindly continuing.