multipart-put and multipart-put/file: Better error handling
greghendershott opened this issue · 6 comments
As discussed in #46, the convenience functions multipart-put
and multipart-put/file
ought to handle things like exn:fail:network?
. After all, being able to resume an interrupted upload is one of the main advantages of multipart uploads. Also, as a default in case the user doesn't want to deal with attempting to resume, the functions should automatically use abort-multipart-upload
to clean up (so the user isn't paying for parts sitting on S3).
Quick/rough brain dump:
- First, just double-check that the
upload-part
function is handling things like 5xx errors with exponential retry. i.e. Better not to fail at all, if possible. - Perhaps the put functions should take a new
failure-proc
arg that defaults toabort-multipart-upload
, but may be a user-supplied function that stores the upload-id and parts-list, and can give them to a newresume-multipart-upload
function? - Should there be a new
suspend-multipart-upload
function to interrupt intentionally? (That could be called from a break handler, for example?)
I think a failure-proc
would be helpful to get the upload-id and parts-list to decide on how to go further. A function for suspending an upload would be nice but nothing I really need at the moment. And I could not think of a situation where I would need that.
A suspend
function will be needed internally, to use when handling e.g. exn:fail
exceptions. Multipart uploads use a pool of 4 threads. If one fails unrecoverably then the whole pool needs to be stopped gracefully.
After that's figured out, correctly, I think it would be helpful to provide
it, too.
Example: Racket presents break, kill and hangup signals as exn:break
exceptions. If the aws client is going to be killed intentionally, it would be helpful for it to catch these using with-handlers
or call-with-exception-handler
and suspend the multipart upload in a way that could be resumed later. (For example in many places residential broadband is not so fast, especially at uploads. Having to start over from scratch isn't great.)
Unlike exn:fail
, I feel breaks should be left to the client of the aws library to handle, and it might want to use a suspend
. Although I suppose I could handle breaks and re-raise
them for the client. Just thinking out loud, here.
In any case, it may be awhile before I can work on this item, at all....
Yeah, sounds interesting and a really nice feature. For me that would have no high priority. So more a "nice to have" or "nice enhancement" feature.
Edit: I for myself try to monitor the my S3 usage with CloudWatch and if something looks weird I can dive into the multipart things with the AWS cli.
@krrrcks Thanks for letting me know that -- it helps me prioritize.
I shouldn't do this, at least not soon. [Unless I have time and want to work on it for fun. :)]
So this was bothering me and I kept working on it.
I spent some time exploring how to make the worker pool handle exn:break
cleanly, and return lists of "done" and "to-do" parts. Then I realized it didn't matter. I could focus on resuming, regardless of how cleanly it got interrupted (and without the need to persist a list of done/to-do parts locally).
So I pushed a commit with a couple "experimental" functions: incomplete-multipart-put/file
and resume-multipart-put/file
. Although the package docs haven't rebuilt yet, the commit message and aws.scrbl changes should explain it pretty well?
I marked it experimental because my testing worked, but was fairly limited. Although I'm fairly confident it's OK to use List Parts this way, because I'm providing Content-MD5
checksums on the uploaded parts, and ensuring they match... I'm not 100% sure.
This has been open for awhile and I'm satisfied the commit closes this issue, until/unless someone feels otherwise.