niklasf/rust-pgn-reader

Rewrite without slice-deque (assertion failed at src\reader.rs:553:13 when running example)

Sopel97 opened this issue · 15 comments

I wanted to benchmark this against my implementation to have a baseline and I hit problems when running the examples.
running on windows
rustc 1.38.0 (625451e37 2019-09-23)
cargo 1.38.0 (23ef9a4ef 2019-08-20)

I compile with cargo build --examples --release
Then I run with cargo run --release --example validate -- "c:/dev/chess_pos_db/data/lichess_db_standard_rated_2014-12.pgn"
And I get an error

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `65536`,
 right: `16384`', src\reader.rs:553:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: process didn't exit successfully: `target\release\examples\validate.exe c
:/dev/chess_pos_db/data/lichess_db_standard_rated_2014-12.pgn` (exit code: 101)

All examples have the same issue. I also tried running the binary directly but no difference. Looks like the buffer is not being filled properly? My knowledge of rust is literally 0 so I can't do much here.
I also tried increasing the MIN_BUFFER_SIZE 4 times, then the assertion passes but the program hangs, no disk read happens at all.

Thanks for reporting. It looks like I am making (and asserting) some assumptions that do not hold on Windows.

It will be a while (maybe a week) until I'll have a chance to debug this. I'll also make sure to add CI for Windows.

Any progress on this? I would appreciate even any ideas/pointers as to what may be the issue here.

Ok, so it looks like we can simply relax the assertion in question.

There is a related remaining soundness issue, because calling tail_head_slice() on a not-fully initialized SliceDeque might be insta UB. I'll try to figure that out before making a release.

Thanks. Now it seems to be working. Though there is still something wrong. It takes a few minutes for a ~100MiB file (lichess 2013-01). I'm doing exactly what I did before. stats runs just as slow. Full cpu core is being used. looking at disk usage it's 50% reads 50% nothing. Totalling around 300KiB/s average.
Can you reproduce it on windows?

Wow, that's horrible. It'll probably be (at least) another week before I can get my hands on a Windows machine again.

No worry, it's not urgent.

Hi,

I'm running into the same problem. Sorry to nag you about this, but any progress or workaround?

No, sorry. I think a proper fix might require a rewrite with an entirely different approach. I want to do that, but no promises on the timeline.

Hi, I'd like to also mention the issue of using SliceDeque. Current issue I'm facing is getting it to work with websockets, because SliceDeque is built to modify memory directly which is kinda hard to do when your code is essentially getting embedded in a web browser.

Code looks not terribly written (thank you), but can you give any insight into why the SliceDeque it is needed for parsing the PGN? I only found one instance of it being defined here

/// Internal read ahead buffer.
#[derive(Debug, Clone)]
pub struct Buffer {
    inner: SliceDeque<u8>,
}

in parse.rs

Everything else is just defining as a generic type.

From looking at the usage, looks like you're just using it to incrementally read in larger buffers from the PGN file. Can you a bit more into why SliceDeque instead of something like simple vec! before I start delving into changing stuff and of course pull requesting this back once I'm done? Would really appreciate it!

The idea was that it's very efficient as a ring buffer. Let's say the buffer content is

|_______________ABCDEF|

where _ is some old/invalid/uninitialized content. The tag or whatever is not complete yet, so we have to rotate

|ABCDEF_______________|

and can then continue to read into the buffer

|ABCDEFGH_____________|

With SliceDeque this is O(1), with Vec moving the contents to the start of the buffer is O(n). But whether or not it is really fast, is unfortunately platform specific (which I didn't realize when implementing this), and some assumptions about SliceDeque internals (like the assertion) are also fragile.

So I'm thinking there's a theoretical approximate upper bound to the size of data we will be reading. If you take 24,000,000 games for example, I'm guessing the file size won't be anywhere over 5-10GB. If we consider that modern SSD have write speeds of 500mb/s-1000mb/s, O(n) instead of O(1) might add maximum about 10-30 seconds. Maybe my calculations are completely wrong, and I'll run an actual speed test once I finish porting the code.

Thanks for the explanation. In that case, what would be the Vec equivalent of move_head and move_tail? Is it simply rotate_left and rotate_right or is there anything else that has to be done?

That's the only one I can't figure out at the moment

Alternatively, why not use read_line from built in Rust file operations? It doesn't allocate a new String every time because you pass in your own.

This is a high performance parser. Every copy is a measurable pessimisation. Reading line by line is not even remotely viable.

Though in my parser I do keep 2 buffers. One being read into asynchronously from a file, second being processed. When nothing more is there to be processed (less than 1 game) the leftover is moved to the begining, wait for async read to complete, append the read data to the processing buffer. While it involves a one full copy of the read data it's ok. At least it's simple and works.

That seems a bit inefficient if minimising copies is the idea. Are you not copying from one buffer to another unnecessarily? You can just use one buffer that acts like a queue and block the read call if it needs to be filled more.

For my purposes, I basically copied your parser and other relevant files and am replacing the SliceDeque with directly reading the BufReader . I also put the responsibility on the caller to create a Seekable object to pass in so they are free to use Cursor, File, etc . Curious where the inefficiency would be if I did that?

I am not planning to work with a very large number of games so overall the performance hit will be negligible for end user.

Fixed via #30.