folbricht/desync

Confusion about catar

lastrosade opened this issue · 3 comments

Is catar just a glorified tar file?

When I use the tar command using a catar target instead of a -s castr -i caidx no compression/deduplication is done.

I am confused as to why the catar format exists, can I archive a castr to catar in order to unpack it using a caidx?

This would be useful to avoid the gigantic directories that windows just can't deal with.

The catar isn't deduplicated, it exists so you can archive a directory (like regular tar) and then chunk it efficiently since the order of files inside the archive is stable. So a caidx is really just the index of a chunked catar file. Some subcommands let you do it in one step, archive a directory into a stream and chunk it in the process, producing an index file (caidx) + chunks in the store.

If you're interested in the "why", it would be better to go to the upstream casync design documents and introductory blog post; the decision to create a catar format was not made by desync's developers.

That said, one major difference is that a catar file is indexed for efficient random access, whereas a tar file is intended to be read front-to-back.

I see. Thanks for the clarification.