apache/avro-rs

`apache_avro::Writer::flush` does not call `std::io::Write::flush` on the inner writer

Closed this issue · 2 comments

(Moving this over from AVRO-4063)

Issue Overview

The Rust documentation for apache_avro::Writer::flush describes the function as follows:

Flush the content appended to a Writer. Call this function to make sure all the content has been written before releasing the Writer.

However, this function does not actually guarantee that all the content will be written out after the flush() call, because it does not call std::io::Write::flush on the inner writer.

This can be a problem when the inner writer uses its own buffer.

Example

fn main() {
    let buffered_writer = std::io::BufWriter::new(std::fs::File::create("test.avro").unwrap());

    let schema = apache_avro::Schema::parse_str(
        r#"
    {
        "type": "record",
        "name": "example_schema",
        "fields": [
            {"name": "example_field", "type": "string"}
        ]
    }
"#,
    )
    .unwrap();

    let mut writer = apache_avro::Writer::new(&schema, buffered_writer);

    let mut record = apache_avro::types::Record::new(writer.schema()).unwrap();
    record.put("example_field", "value");

    writer.append(record).unwrap();
    writer.flush().unwrap();

    let test_file_contents = std::fs::read("test.avro").unwrap();
    assert_ne!(test_file_contents.len(), 0); // this will fail
}

In this example, the internal BufWriter had not yet flushed its internal buffer after writer.flush().unwrap() was called. In fact, the buffer is only written out once writer is dropped.

Solution

std::io::Write::flush should be called on the inner writer at the end of apache_avro::Writer::flush.

@martin-g Could you add me as an assignee to this issue? Thanks!

Only members of the Avro team could be assigned.
Your comment is enough!
Thank you!