derniercri/snatch

Implement the 'Interruptable' feature

k0pernicus opened this issue · 4 comments

Implement the 'Interruptable' feature

axel has a state file which gets updated periodically while downloading. Upon restart axel can continue from the state last saved.

It would be enough to save the total file size and the remaining holes in the download to the status file.

As an alternative to the state file approach, you could store the information in the first bytes of the downloading file, and give that file a custom file extension. The extension and download metadata would be stripped when the download completes. Here's a quick proposal for how it would work.

  • First 32 bits would be a u32 representing the number of 64kb chunks in the file (n).
  • Next n bits would be bit flags for if that chunk has been downloaded.
  • Rest of the file would follow

Limitations:

  • This would limit the max size of the download to 274,877,906,880 bytes (~256TB)
  • Total downloaded by threads should be a multiple of 64KB, so that restarting with a different thread count doesn't break everything
  • Threads would need to be aware of the offset the metadata header introduces, so that their chunks are downloaded to the right place

I think that we should try different approaches, make some bench and take the one(s) that offer the best trade-off between a fast step "Retrying the download" and a great memory usage to store current file informations (and what to download next).
Currently, we plan to improve and stabilize the "download" part, before to implement this feature - like, for example, switch to a mono-thread download when the server cannot give any informations about the remote content size, or try a strategy (for example using Divide and conquer algorithms) to "guess" the remote content sending header requests (like byte-ranges or something) and keep a close eye about the server's responses... :-)

I played around with the idea I had above and built https://github.com/daveallie/grapple.

I made some changes to the proposal I had earlier:

  • I used a u64 to hold the chunk count.
    • This costs 4 more bytes but effectively overcomes the max filesize issue.
  • I moved the chunk metadata to the end of the file.
    • When the download is complete, the file can just be truncated to its required length.
  • I used a 128kb chunk size to reduce the amount to writing to file was being done.

Would be happy to give a more detailed explanation of how the 'restarting' phase works if you plan to go with the process I method above.