zip-rs/zip-old

Binary file seems to be corrupted with method Deflated

mass10 opened this issue · 14 comments

Summary

Binary file seems to be corrupted with method zip::CompressionMethod::Deflated.

Code to reproduce

use std::io::{Read, Write};

/// Example of deflate.
fn main() -> Result<(), Box<dyn std::error::Error>> {
	// ARCHIVE
	{
		// START
		let w = std::fs::File::create("notepad.exe.zip")?;
		let mut archiver = zip::ZipWriter::new(w);

		let options = zip::write::FileOptions::default();
		// let options = options.compression_method(zip::CompressionMethod::Stored); // SAFE
		let options = options.compression_method(zip::CompressionMethod::Deflated); // CORRUPT🔥
		archiver.start_file("notepad.exe", options)?;

		// READ
		let mut stream = std::fs::File::open("C:\\Windows\\system32\\notepad.exe")?;
		loop {
			let mut buffer = [0; 1000];
			let bytes_read = stream.read(&mut buffer)?;
			if bytes_read == 0 {
				break;
			}

			// WRITE
			let write_buffer = &buffer[..bytes_read];
			archiver.write(&write_buffer)?;
		}

		// COMPLETE
		archiver.finish().unwrap();
	}

	// UNZIP
	{
		let command = std::process::Command::new("wsl.exe").args(&["unzip", "notepad.exe.zip"]).spawn()?.wait()?;
		println!("Command exited with: {}", command);
	}

	// ORIGINAL FILE SIZE
	{
		let meta = std::fs::metadata("C:\\Windows\\system32\\notepad.exe")?;
		println!("Original NOTEPAD is: {} bytes.", meta.len());
	}

	// EXTRACTED FILE SIZE
	{
		let meta = std::fs::metadata("notepad.exe")?;
		println!("Extracted NOTEPAD is: {} bytes.", meta.len());
	}

	return Ok(());
}

Result is

Original NOTEPAD is: 201216 bytes.
Extracted NOTEPAD is: 200903 bytes.

Thank you!

zip::CompressionMethod::Zstd sounds good.

But I found neither of the following supports ZStandard:

  • 7-zip
  • explorer.exe
  • unzip

Just ran into this also. Have to use Stored for now...

You are using archiver.write (an implementation of std::io::Write::write) and ignoring the return value. You should be calling write in a loop checking the count of bytes written, or use archiver.write_all.

https://doc.rust-lang.org/stable/std/io/trait.Write.html#tymethod.write

This function will attempt to write the entire contents of buf, but the entire write might not succeed, or the write may also generate an error. A call to write represents at most one attempt to write to any wrapped object.

mass10 commented

Oh, It was fixed!

Thank you!

milesj commented

I'm using write_all and it still fails, here's my impl: https://github.com/moonrepo/starbase/blob/master/crates/archive/src/zip.rs

mass10 commented

@milesj I thought I want to use fs::read_file_bytes() instead of fs::read_file() in your code.

milesj commented

When I try that I get this error "stream did not contain valid UTF-8", which is the same as using fs::read_file(file)?.as_bytes().

Both are interesting, since these test cases are very simple.

I also tried this, which failed with the same error.

let mut buffer = Vec::new();
let mut stream = fs::open_file(file)?;
stream.read_to_end(&mut buffer).unwrap();

I feel like something else may be going on.

mass10 commented

I tried this.

/// Succeeds🌔
fn diagnose_1(path: &str) {
    let data = std::fs::read(path).unwrap();
    println!("(1) {} is {} bytes.", path, data.len());
}

/// Succeeds🌔
fn diagnose_2(path: &str) {
    use std::io::Read;
    let mut file = std::fs::File::open(path).unwrap();
    let mut buffer = Vec::new();
    file.read_to_end(&mut buffer).unwrap();
    println!("(2) {} is {} bytes.", path, buffer.len());
}

/// Fails🌒
fn diagnose_3(path: &str) {
    let text = std::fs::read_to_string(path).unwrap(); // 🔥stream did not contain valid UTF-8
    let data = text.as_bytes();
    println!("(3) {} is {} bytes.", path, data.len());
}

pub fn main() {
    diagnose_1(r"C:\Windows\system32\notepad.exe"); //🌔
    diagnose_2(r"C:\Windows\system32\notepad.exe"); //🌔
    diagnose_3(r"C:\Windows\system32\notepad.exe"); //🌒
}

Result was

(1) C:\Windows\system32\notepad.exe is 201216 bytes.
(2) C:\Windows\system32\notepad.exe is 201216 bytes.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }', src\test2.rs:18:46
milesj commented

It's weird, because even if I do this it fails:

self.archive
    .start_file(name, options)
    .map_err(|error| ZipError::AddFailure {
        source: file.to_path_buf(),
        error,
    })?;

self.archive
    .write_all(&std::fs::read(file).unwrap())
    .map_err(|error| FsError::Write {
        path: file.to_path_buf(),
        error,
    })?;

I wonder if this abstraction is causing issues once compiled.

Example PR: moonrepo/starbase#33

mass10 commented

@milesj What kind of file is broken or fail?

milesj commented

Not sure which file is the problem, but this is the fixture: https://github.com/moonrepo/starbase/tree/master/crates/archive/tests/__fixtures__/archives

They are very simple files.

milesj commented

I figured out the problem, was my mistake. The issue was I was using by_index_raw instead of by_index when unzipping.

I'm using Copilot and I think it generated that 😞

mass10 commented

That's good! Everything has become clear.