greghendershott/aws

multipart-put and multipart-put/file: Better error handling

greghendershott opened this issue · 6 comments

As discussed in #46, the convenience functions multipart-put and multipart-put/file ought to handle things like exn:fail:network?. After all, being able to resume an interrupted upload is one of the main advantages of multipart uploads. Also, as a default in case the user doesn't want to deal with attempting to resume, the functions should automatically use abort-multipart-upload to clean up (so the user isn't paying for parts sitting on S3).

Quick/rough brain dump:

  • First, just double-check that the upload-part function is handling things like 5xx errors with exponential retry. i.e. Better not to fail at all, if possible.
  • Perhaps the put functions should take a new failure-proc arg that defaults to abort-multipart-upload, but may be a user-supplied function that stores the upload-id and parts-list, and can give them to a new resume-multipart-upload function?
  • Should there be a new suspend-multipart-upload function to interrupt intentionally? (That could be called from a break handler, for example?)

I think a failure-proc would be helpful to get the upload-id and parts-list to decide on how to go further. A function for suspending an upload would be nice but nothing I really need at the moment. And I could not think of a situation where I would need that.

A suspend function will be needed internally, to use when handling e.g. exn:fail exceptions. Multipart uploads use a pool of 4 threads. If one fails unrecoverably then the whole pool needs to be stopped gracefully.

After that's figured out, correctly, I think it would be helpful to provide it, too.

Example: Racket presents break, kill and hangup signals as exn:break exceptions. If the aws client is going to be killed intentionally, it would be helpful for it to catch these using with-handlers or call-with-exception-handler and suspend the multipart upload in a way that could be resumed later. (For example in many places residential broadband is not so fast, especially at uploads. Having to start over from scratch isn't great.)

Unlike exn:fail, I feel breaks should be left to the client of the aws library to handle, and it might want to use a suspend. Although I suppose I could handle breaks and re-raise them for the client. Just thinking out loud, here.


In any case, it may be awhile before I can work on this item, at all....

Yeah, sounds interesting and a really nice feature. For me that would have no high priority. So more a "nice to have" or "nice enhancement" feature.

Edit: I for myself try to monitor the my S3 usage with CloudWatch and if something looks weird I can dive into the multipart things with the AWS cli.

@krrrcks Thanks for letting me know that -- it helps me prioritize.

I shouldn't do this, at least not soon. [Unless I have time and want to work on it for fun. :)]

So this was bothering me and I kept working on it.

I spent some time exploring how to make the worker pool handle exn:break cleanly, and return lists of "done" and "to-do" parts. Then I realized it didn't matter. I could focus on resuming, regardless of how cleanly it got interrupted (and without the need to persist a list of done/to-do parts locally).

So I pushed a commit with a couple "experimental" functions: incomplete-multipart-put/file and resume-multipart-put/file. Although the package docs haven't rebuilt yet, the commit message and aws.scrbl changes should explain it pretty well?

I marked it experimental because my testing worked, but was fairly limited. Although I'm fairly confident it's OK to use List Parts this way, because I'm providing Content-MD5 checksums on the uploaded parts, and ensuring they match... I'm not 100% sure.

This has been open for awhile and I'm satisfied the commit closes this issue, until/unless someone feels otherwise.