xymb-endcrystalme/LinearRegionFileFormatTools

Documentation on the Linear format

Opened this issue · 8 comments

Is there any documentation on the actual Linear format? We are considering adding support for it to Cuberite, but I cannot find any actual formal spec of the format.

Unfortunately, only implementations. I never bothered with documentation.

https://github.com/xymb-endcrystalme/LinearRegionFileFormatTools/blob/main/linear.py
https://github.com/xymb-endcrystalme/LinearPaper/blob/master/patches/server/0002-Linear-region-file-format.patch

As you can see it's a really simple piece of code. But I doubt it's of any priority for Cuberite, .mca is good enough until you have a really huge worlds.

I'm impressed with how feature complete Cuberite is!

That is unfortunate, without a proper spec, we probably won't do it. Code-as-documentation is too brittle, someone changes the code and suddenly the format is no longer compatible, or there is a subtle corner case behavior that goes undetected by someone not versed in the original code's language.

Maybe one day...

Since I was interested in Linear as well for conversion tooling and maybe porting for use in Fabric, I've gone ahead and reverse-engineered the Python code and made a pseudo-struct.

// Width:
// byte = 1 byte
// short = 2 bytes
// int = 4 bytes
// long = 8 bytes

// All integers are read and written as big endian

file {
  header {
    // Always 0xc3ff13183cca9d9a
    long signature;

    // Either 1 or 2. Always 1 when written by the tool.
    byte version; 

    // Newest chunk within the region.
    // Note: Seems odd when the timestamp is an int.
    long newestTimestamp;
    
    // ZStandard compression level
    byte compressionLevel;
    
    // Count of real chunks present in the region.
    short chunkCount;
    
    // Length of the complete region compressed. Not currently verified by reader.
    int completeRegionLength;
    
    // completeRegionHash, currently always 0x0000000000000000
    // Assuming this will eventually be CRC64 or `h = h * 31 + b`?
    long reserved;
  }
  zstd ( // Meaning that it was compressed/wrapped with/in zstd.
    chunks[32*32] {
      // Size of the chunk in bytes.
      int size;

      // Last modified time.
      int timestamp;
    }

    // Standard Anvil chunk.
    // This could also be Alpha if someone wanted to backport this to pre-Anvil.
    byte data[32*32][]; //[size];
  )
  footer {
    // Always 0xc3ff13183cca9d9a
    // Assuming present solely to detect an incomplete write.
    //
    // Noting that this doesn't seem super optimal to check in performance-critical given
    // that this requires an additional seek, which is a lot of time to ask on a HDD.
    long signature;
  }
}

This seems correct to me, but Python isn't something I usually work with.

This is essentially describing the file layout as header, chunk table (like McRegion although as {size,time}[1024] instead of size[1024],time[1024]), all chunks back to back without padding, then the signature again as the footer.

This does indeed seem to be right, as I've successfully implemented a Linear to McRegion converter, with the input being a region from this tool. NbtExplorer and Minecraft both at least can read the chunk data fine after the conversion back.

Also, Linear's easily.. far better for streaming with compared to McRegion, even if its format still makes it slightly difficult by design, notably by requiring sizes at 2 headers, while McRegion requires setting sectors correctly and has.. definitely unusual nuances for a special-made design.

Only a couple minor oversights made writing the parser compared to the several easy to make oversights with writing the McRegion writer, when writing with the only reference is spec or a raw structure, and trying against NbtExplorer and Minecraft.

I've written a small doc about the format as well: https://gist.github.com/Aaron2550/5701519671253d4c6190bde6706f9f98

Once I release 1.21 Abomination I'll get GPT to write the documentation for me.