uhthomas/automata

dev/kipp: Large/long uploads result in 500 or 502

Closed this issue · 4 comments

When an upload is large (100MB+), or takes a while, one of two things will happen:

  1. NGINX returns a 502.
  2. Kipp returns a 500 because it got a 502 from S3 (Linode).

I've ruled out that Linkerd is not the cause by removing the sidecar injection. Not sure if it's a timeout in ingress-nginx, or something greater.

Here's some example responses:

upload: MultipartUpload: upload multipart failed
    upload id: 2~uS5fEhLYQ2me9YEDAKu8KVHG7yP3wia
caused by: SignatureDoesNotMatch: 
    status code: 403, request id: tx0000000000000019011e5-005ed2e4fc-5041f0-default, host id:
upload: MultipartUpload: upload multipart failed
    upload id: 2~C-GeAYog6R-kJNNF66uJXJV7i796gt2
caused by: SerializationError: failed to unmarshal error message
    status code: 502, request id: , host id: 
caused by: UnmarshalError: failed to unmarshal error message
    00000000  3c 68 74 6d 6c 3e 0d 0a  3c 68 65 61 64 3e 3c 74  |<html>..<head><t|
00000010  69 74 6c 65 3e 35 30 32  20 42 61 64 20 47 61 74  |itle>502 Bad Gat|
00000020  65 77 61 79 3c 2f 74 69  74 6c 65 3e 3c 2f 68 65  |eway</title></he|
00000030  61 64 3e 0d 0a 3c 62 6f  64 79 3e 0d 0a 3c 63 65  |ad>..<body>..<ce|
00000040  6e 74 65 72 3e 3c 68 31  3e 35 30 32 20 42 61 64  |nter><h1>502 Bad|
00000050  20 47 61 74 65 77 61 79  3c 2f 68 31 3e 3c 2f 63  | Gateway</h1></c|
00000060  65 6e 74 65 72 3e 0d 0a  3c 68 72 3e 3c 63 65 6e  |enter>..<hr><cen|
00000070  74 65 72 3e 6f 70 65 6e  72 65 73 74 79 3c 2f 63  |ter>openresty</c|
00000080  65 6e 74 65 72 3e 0d 0a  3c 2f 62 6f 64 79 3e 0d  |enter>..</body>.|
00000090  0a 3c 2f 68 74 6d 6c 3e  0d 0a                    |.</html>..|

caused by: expected element type <Error> but have <html>

It's also intermittent (or rather, it intermittently works). Most timeouts in NGINX default to 30s, but some uploads > 1m work okay. It's primarily not an ingress issue, but rather egress.

Something else interesting is that the AWS S3 SDK is reporting a 403... That doesn't make any sense. I can upload endless amounts of small objects, so the permissions are fine.

Looks like Linode Object Storage is the culprit. Have opened an issue with them. Hopefully should hear something back soon.

As far as I'm aware, after moving away from Linode Object Storage, this is now fixed.