Theme deploying unreliable with 500 errors; won't retry failed asset(s)

Question

Theme deploying unreliable with 500 errors; won't retry failed asset(s)

dgpokl opened this issue 5 years ago · 0 comments

Problem

When running slate-tools deploy, errors are frequently encountered which mean that the asset(s) didn't successfully upload. It appears that the sequence of events doing a deploy is:

For each asset, upload the asset:
- Capture the output text of that upload command into a buffer but don't display it yet
In the event any assets fail to upload, silently fail and continue, i.e. the success of the asset uploads has does not trigger a retry or an abort.
Print out that buffer from step 1 showing all the failures.
Display a success message (even in the case where failures occurred) like this:

Files overwritten successfully!

✨  Done in 204.89s.

The problem with this approach should be apparent, but I'll spell it out: There are frequent transient unexplained HTTP 500 (or today during the shopify outage, 503) errors that occur with this API. Example of one:

13:05:55 [default]Asset Perform Update to snippets/icon-minus-mobile.liquid at host xxxxxx.myshopify.com
	Status: 500 Internal Server Error
	Errors: Internal Server Error

Simply deploying again will always result in a different result - whether that's complete success or a different set of assets that fail depends on...the weather? Anyway, I ~~can't~~ really don't want to have to build this into a scriptable deploy because I'd have to wait for all several hundred files to be tried, then grep through the STDOUT/STDERR of this deploy command looking for errors, and if even one file failed, I need to deploy the whole thing again (or drop down to themekit and reupload each file myself - but this brings up the question of what is this slate-deploy command even for if one must write a whole program to clean up after it?)

Replication steps

Have a bunch of files, maybe 300? (for all i know though, this may be optional, maybe it can hit anyone)
Try to yarn deploy
It might succeed or the above scenario might happen, listing a random set of assets and HTTP 500 errors for each.

What I would consider reasonable (just my opinion)

If this project weren't in deprecated/limbo status, this is what I would propose

Option A (lazy but decent): Exit with status 1 IMMEDIATELY anytime an asset upload is NOT successful

I could script that something like this:

while true ; do
  yarn deploy
  [[ $? == "0" ]] && break
done

Thus ensuring success on an infinite timescale but still wasting a lot of time reuploading files.

Option B (sane)

While deploying, check to see if file uploads succeed. If they don't succeed,

check to see if there's a retry-after header or something and if so, heed it
otherwise just delay 1s or something and retry the file a fixed number of times
if it exceeds a certain number of retries, exit 1

Speculation

Perhaps the 500 errors are something that only affects people with more than a small number of files
maybe there's a rate-limiting regime in place and slate is ignorant of what that limit is
The 500 error seems to provides no explanation of what's wrong, and/or Slate or themekit is eating the headers it would need to tell it when it can retry and instead just blindly continuing.