10XGenomics/lz4-rs

Can only read once after flush

Opened this issue · 0 comments

`use std::io::{Read, Write};

fn main() {

// create an encoder
let mut enc = lz4::EncoderBuilder::new()
    .build(Vec::new()).unwrap();

// write 'a' 100 times to the encoder
let text: Vec<u8> = vec!['a' as u8; 100];
enc.write_all(&text[..]).unwrap();

// flush the encoder
enc.flush().unwrap();

// create a decoder wrapping the backing buffer
let mut dec = lz4::Decoder::new(&enc.writer()[..]).unwrap();

// read from the decoder, buf_size bytes at a time
let buf_size = 10;
let mut buf = vec![0; buf_size];

let expected_reads = 10;
let mut num_reads = 0;

while let Ok(n) = dec.read(&mut buf[..]) {
    if n == 0 {
        break;
    }
    num_reads += 1;
}

assert_eq!(num_reads, expected_reads);

}`

In this code, 100 bytes of data is encoded and then the std::io::Write::flush() function is used on the encoder. Then, a Decoder is created around the backing buffer to read the data. I expected the Decoder to read all 100 bytes and in this case, it would have made 10 read calls, each of which consumes 10 bytes. However, the observed behavior is that the Decoder returns exactly one successful read, and then never reads again. It doesn't matter the number of bytes read, buf_size could be set to 1 or 99, and then only 1 byte or 99 bytes would be read out, then nothing else would be read.

This issue was reproduced on other types of IO, like TcpStreams.

I also noticed that if the total data encoded was small enough, roughly <30 bytes, then the Decoder would perform multiple reads, and this issue would not occur.

When swapping out your lz4 Decoder with a different rust lz4 decoder implementation, I was able to read out all the bytes, which leads me to believe there is an issue in the Decoder, not the Encoder.