holepunchto/hypercore

Handling large Hypercores persisted to disk (Windows)

xori opened this issue · 10 comments

xori commented

Something I came across while working on the p2p-indexing-and-search workshop is that because of how the data is written to disk any hypercore (sparse or otherwise) takes the full space on disk (and then some). Even though solution 14 downloads only ~100kb of data from the imdb database it generates 8.92GB of files on disk.

image
you can see the code I use here

When reading about how hypercore handles storage it seems like it should handle these sparse cases better.

Because Hypercore and Dat support partially downloading data, a useful feature is to implement sparse persistence. This means that we can write data into memory or to disk with spaces in between, but without paying any cost.

Is this something that needs to be implemented, a regression, or am I just not understanding it properly?

xori commented

Upon looking into it more, I presume this is because RAF would need to implement creating these files with the OS's sparse flag, which is complicated. I know that in Windows you require Admin rights to create sparse files with fsutil, and nodejs's fs library doesn't implement the fallocate function.

xori commented

Surprising no one, actually setting the sparse flag on these files results in huge savings. Still seems huge but I presume that's just Windows trying the best it can.

image

xori commented

Set the compress flag, and we hit the size I was expecting.
image

I wrote this a while back so it may be out of date, but here's a random-access storage that's designed for filesystems that can't do sparse indexing: https://github.com/pfrazee/random-access-indexed-file. (This is basically a storage plugin which hypercore can use)

@xori thanks! Which flags do you set on windows for this? Then I’ll update it. Sorry we aren’t Windows experts

xori commented

I only know how to do it via the fsutil command in Windows.

fsutil sparse setflag <file>;  # enables the "<file>" to be marked as "sparse"
fsutil behavior set disablecompression 0 <file>; # by default this is enabled?

The downside is that the fsutil sparse requires Admin privileges, and I'm unaware of any pure javascript way of marking the file for the OS.

For linux, I only know of fallocate --dig-holes <file>, with once again, no pure javascript way of marking this.

It sucks that we have to break out to the OS for this, but thanks @pfrazee for linking me to your RAIF package. I'll monkey patch that into my local hypercore fork for my project for now because I'd rather take the performance hit than hit incompatibility with certain OSs.

Edit: I guess I should ask, does this not happen for you on MacOS/Linux?

No linux/mac all use sparse files per default

xori commented

🤯

FYI, seems I'm not the only one looking into this problem. xxoo/node-fswin#25

He he he... feel free to beat me to it with the PR. currently focused on other work and might be a few months off.

I'd be happy to help set it up and integrate with the hyper stack in your apps.

Fixed in 9.9.0