Bad assumption on read() results
ayende opened this issue · 5 comments
In https://github.com/skerkour/kerkour.com/blob/main/2022/rust_file_encryption_with_password/src/main.rs#L75 (and in many other places), you are assumption that the only scenario where you'll get less than the number of requested bytes is when you are at the end of the file.
This is wrong.
See the docs here:
https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read
In particular:
It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.
There are many reasons why this may be the case.
A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet. The file system will return what it has right now. That can lead to really hard to figure out bugs.
Hey @ayende,
It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet.
Indeed i'm aware of this warning but in my experience when reading a local file, read
never returned less than BUFF_LEN
,
Because the docs also say:
While for File, it is possible to reach the end of file and get zero as result
This function does not provide any guarantees about whether it blocks waiting for data, but if an object needs to block for a read and cannot, it will typically signal this via an Err return value.
So I believe it should be safe to assume that when reading a local file read
will prefer to block rather than return a misleading result.
But, you also state:
A great example is if the read you are doing happened to start on a region that is already in memory, but continues to a page that is not there yet.
I'm learning this today, Thank you very much. But I also find that very surprising!
Do you have a recommendation to fix this potential issue? Use a BufReader?
You need to keep reading until read()
returns a zero, that is the only scenario where you can be sure that the file is done.
Something like this:
loop {
let read_count = source_file.read(&mut buffer)?;
if read_count == 0{
break;
} else {
let ciphertext = stream_encryptor
.encrypt_last(&buffer[..read_count])
.map_err(|err| anyhow!("Encrypting large file: {}", err))?;
dist_file.write(&ciphertext)?;
}
}
That work for all scenarios.
I noticed that you have encrypt_next
vs. encrypt_last
, you should probably just call encrypt_next
always, and pass encrypt_last
with empty buffer.
Thnaks!
The problem is that it would make the encryption/decryption way more complex: we need to operate on the same chunks when decrypting than when encrypting.
Rust's std File.read
uses directly the read
of the OS's libc (https://doc.rust-lang.org/src/std/fs.rs.html#618, https://github.com/rust-lang/rust/blob/master/library/std/src/sys/unix/fs.rs#L852).
Where can I find information about the behavior you have described above (that File.read
can return less than the requested number of bytes for a local file, so far I can only find information stating that read can return less than the requested number of bytes when reaching end of file)?
If you need to make the distinction between next & last, read to the buffer until it is full, even if you get partial read.
And only stop if you get 0
back.
For reference, see:
It is not an error if this number is smaller than the number of
bytes requested