airlift/aircompressor

Compression ratio is different in ZSTD algorithm between ZstdOutputStream and ZstdCompressor.compress(Bytebuffer)

believezzd opened this issue · 2 comments

Description

  • I have implemented two way to compress a large file(40M+)
  • One is using the ZstdCompressor.compress with ByteBuffer as args.
  • The other is using the ZstdOutputStream.
  • ZstdCompressor results in 12128084/40527865 compression ration
  • ZstdOutputStream results in 18109016/40527865 compression ration
  • The target file is 40527865 bytes
  • ZstdCompressor come with silimiar compression ratio to the zstd-jni in https://github.com/luben/zstd-jni
  • it must be something wrong that I can't figure out, so I ASK for HELP.

Aircompressior Version

<dependency>
      <groupId>io.airlift</groupId>
      <artifactId>aircompressor</artifactId>
      <version>0.26</version>
</dependency>

Code

ZstdCompressor.compress(Bytebuffer)

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;

    ByteBuffer inBuffer = ByteBuffer.allocateDirect(8*1024*1024); 
    ByteBuffer outBuffer = ByteBuffer.allocateDirect(8*1024*1024);
    try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r"); 
        RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
        FileChannel inChannel = inRaFile.getChannel();
        FileChannel outChannel = outRaFile.getChannel()) {

        ZstdCompressor compressor = new ZstdCompressor();
        inBuffer.clear();
        while(inChannel.read(inBuffer) > 0) {
            inBuffer.flip();
            outBuffer.clear();

            compressor.compress(inBuffer, outBuffer);

            outBuffer.flip();
            outChannel.write(outBuffer);
            inBuffer.clear();
        }
    }

    return numBytes;
}

ZstdOutputStream

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;
    byte[] buffer = new byte[1024 * 1024 * 8];

    FileInputStream fi = null;
    FileOutputStream fo = null;

    try {
        fi = new FileInputStream(inFile);
        fo = new FileOutputStream(outFile);

        try (ZstdOutputStream zs = new ZstdOutputStream(fo)) {
            while (true) {
                int compressedSize = fi.read(buffer, 0, buffer.length);
                if (compressedSize == -1) {
                    break;
                }

                zs.write(buffer, 0, compressedSize);

                numBytes += compressedSize;
            }
        }
    } catch (Exception ex) {
        log.error("Error: ", ex);
    } finally {
        IOUtils.closeQuietly(fi);
        IOUtils.closeQuietly(fo);
    }

    return numBytes;
}

File to Compress

Computer

  • Intel Core i5
  • MacBook Pro
  • macOs Sonoma 14.0

JDK

  • 1.8.0_311

@martint

Could you give me a help.

dain commented

The answer is they are very different compression techniques. ZstdCompressor is a block compressor which means it compresses a block of data in memory to an output buffer in memory in one shot. The requires the full input and output buffers to fit into memory. ZstdOutputStream is a stream compressor, which chops the imput data into chunks and uses the block compressor to compress the chunk. This means only part of the data needs to fit into memory at a time, but doesn't compress quite as well (it also adds extra data to the outptu describing the framing and such). BTW, what I am describing works for basically every compression algorithm.