b-inary/postflop-solver

Slow time to load saved solution

mhluska opened this issue · 3 comments

It looks like it takes around 7.5 seconds to load one of the solutions. And it doesn't seem to parallelize at all. If I load 4 solutions, it takes ~34 seconds.

I'm doing this to save a file:

  let config = config::standard();
  let mut file = BufWriter::new(File::create("test.bin").unwrap());
  bincode::encode_into_std_write(&game, &mut file, config).unwrap();

Which produces a 1.8 GB file after solving. Then I have another executable that does this:

  let mut file = File::open("test.bin").unwrap();
  let config = config::standard();
  let mut game: PostFlopGame = bincode::decode_from_std_read(&mut file, config).unwrap();

The bincode::decode_from_std_read step takes 7.5 seconds. Is that correct? I'm using a MacBook Pro M2 Max.

If I copy the test.bin file four times, and do something like this to concurrently load four files, it takes 4x as long:

./target/release/examples/load test1.bin & 
./target/release/examples/load test2.bin & 
./target/release/examples/load test3.bin & 
./target/release/examples/load test4.bin & 
wait

You'd think that with four processes in the background, it would load these concurrently. I'm not too familiar with Rust though.

I just realized if I use your function, it loads in around 1 second:

let mut game: PostFlopGame = load_data_from_file("test.bin", None).unwrap().0;

I still have the issue of not being able to load files concurrently though.

The reason the load_data_from_file function is faster than your code is probably because the former uses std::io::BufReader. From here on, I will assume a situation where the BufReader is used.

In conclusion, I don't think parallel loading improves performance. I cannot say for sure about the MacBook's SSD performance because it depends on capacity, but assuming a 1 TB or larger capacity SSD, which is the fastest, its read speed is at most 6 GB/s. This maximum speed would not be achievable in a real-world application, so reading 1.8 GB in 1 second is actually fast enough. Parallel execution is not a magic bullet for speed and cannot overcome SSD bottlenecks.

That makes sense. Thank you 🙏