Resumability without explicit checksums
Closed this issue · 2 comments
I have got an idea how to achieve resumability without specifying CRCs. Currently it doesn't work as Range header is supported only with CRC-32 explicitly specified on archive contents list.
So simply the module could remember the state of checksum calculation. If the transfer is aborted then user could only start resume exactly from breakpoint or earlier (Range + Content-Range). Calculation can be resumed once user is past the last calculation point.
That would require temporary storage for calculations and ZIP "session" identification.
Calculations storage could be shared memory, local file or redis. The "session" key could be SHA1 of upstream archive contents specification.
Completed and stored checksum calculations could be reused with next downloads of the same ZIP.
I rather dislike the idea of temporary storage -- it adds quite a bit of complexity to the module. But, if you're going to go that route, I would forget about "sessions" altogether and just store the CRC's of recently streamed files, perhaps keyed by upstream location. Maybe a fixed (configurable) number of entries so that memory doesn't get out of hand. I would do it in shared memory since dealing with the file system or Redis is asking for pain, as Nginx is nonblocking.
@evanmiller thanks for quick response, you are absolutely right with complexity stuff. I use your module for long time with great results. Now I generate CRC32 during upload with my nginx module. For external uploads (S3) I have JavaScript File API based on-the-fly CRC-32 calculation.
Anyway I just wanted to drop an idea if someone would like to implement it purely server-side for content lacking checksum.
BTW non-blocking file IO in nginx works pretty well. I production tested it for long time under load with my lua nginx upload module which stores temporal SHA1 struct for resume. Redis IO can be also non-blocking. However it would be rather challenging for me to implement it in C module.