icedream/go-bsdiff

Reversible patches

Closed this issue · 3 comments

Hi,

This is very cool!
I was wondering if its possible to have reversible patches or what it would take to get them? I notice it's mentioned here by Colin Percival, who i believe is one of the authors of the C code underlying this: https://twitter.com/cperciva/status/581217879170936833

If I had the latest file and wanted to strip off patches, rather than add patches, do you know how that would work?

Thanks
Alex

If I understand this correctly, the algorithm only really is designed to apply patches in one direction - forwards, not backwards.

Let's say you want to realize this suggestion with two unidirectional patches and generate a single file that contains both patches, hoping to save as much space as possible using compression across both of the patches. To do this as elegant as possible, you would create a file, layer a compressing writer like bzip2.Writer on top of it, run the raw Diff routine twice, once with old file→new file and once with new file→old file, and have it write to this compressing writer. How you separate these two patches in the output is up to you and can be done as elaborately as you wish; at the very least your code should be able to decide which of those patches to apply and to extract that patch properly from the stream. Example: For each of those two patches you could save a hash of the original file content, then the length of the patch and lastly the patch bytes themselves.

However, if you only have a simple bsdiff patch file, there is not really a way to revert this patch unfortunately.

Hopefully I could help you! Please make sure to close this issue if it is resolved for you.

Thanks yes, that's what I thought would be the case, however always good to check I hadn't missed something!

In terms of running the raw diff elegantly and fast, because I will run it twice, I can put each on a go routine so they happen in parallel to each other (large files can take a while to diff). Is there any other optimisation that can be done to speed up the diffing?

Except for possibly buffering both outputs in a way that you only need to compress once instead of twice, I don't really have any idea how to further optimize it on the level the library would allow it.