performance regression on Windows

Question

performance regression on Windows

Opened this issue 5 years ago · 3 comments

after updating my dependencies I had a performance regression from 0.5s to 1.8s. I tracked it down to repeated calls to allocation_granularity() every time that consume() is called in my buf_redux created ring buffer (they updated their slice_deque dependency from 0.1 to 0.2) . allocation_granularity() calls GetSystemInfo() on Windows which is very slow (1.3 seconds of my 1.8 second run-time).

can you tell if all of the calls to allocation_granularity are necessary or if buf_redux is misusing the API? Stack trace looks like this:

   5: slice_deque::mirrored::winapi::allocation_granularity
             at C:\Users\icooke\tmp\slice_deque\src\mirrored\winapi.rs:49
   6: slice_deque::mirrored::buffer::Buffer<u8>::size_in_bytes<u8>
             at C:\Users\icooke\tmp\slice_deque\src\mirrored\buffer.rs:112
   7: slice_deque::SliceDeque<u8>::move_head_unchecked<u8>
             at C:\Users\icooke\tmp\slice_deque\src\lib.rs:683
   8: slice_deque::SliceDeque<u8>::move_head<u8>
             at C:\Users\icooke\tmp\slice_deque\src\lib.rs:735
   9: buf_redux::buffer::slice_deque_buf::SliceDequeBuf::consume
             at C:\Users\icooke\tmp\buf_redux\src\buffer\slice_deque_buf.rs:81
  10: buf_redux::buffer::BufImpl::consume
             at C:\Users\icooke\tmp\buf_redux\src\buffer\mod.rs:42
  11: buf_redux::Buffer::consume
             at C:\Users\icooke\tmp\buf_redux\src\lib.rs:1175
  12: buf_redux::{{impl}}::consume<std::fs::File,buf_redux::policy::MinBuffered>
             at C:\Users\icooke\tmp\buf_redux\src\lib.rs:413
  13: std::io::read_until<buf_redux::BufReader<std::fs::File, buf_redux::policy::MinBuffered>>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\io\mod.rs:1625
  14: std::io::BufRead::read_line::{{closure}}<buf_redux::BufReader<std::fs::File, buf_redux::policy::MinBuffered>>
             at /rustc/4560ea788cb760f0a34127156c78e2552949f734\src\libstd\io\mod.rs:1866

buf_redux code is here: https://github.com/abonander/buf_redux/blob/4c59a0f6c423f80bcd099f543c547d449d92b1d3/src/buffer/slice_deque_buf.rs#L80

so you can see every line I read involves a GetSystemInfo() call which is killing me.

If the calls to allocation_granularity() are necessary I recommend caching the value somehow. I made a local copy of slice_deque and used lazy_static to calculate it once and my performance did return to 0.5s. If you want to add the dependency on lazy_static I can put together a pull request for that.

Answer 1 · 2019-11-29T15:53:06.000Z

Dear @ian-p-cooke, thank you for the report!

I've pushed a PR here: #80 with a potential fix. Sadly, I don't have a windows machine to test if it solves your issue.

Would you be so kind of patching your slice_deque dependency to use the branch in my PR, and try it out ?

Thank you!

Answer 2 · 2019-12-01T16:31:54.000Z

@gnzlbg that works well for me! performance is better than my branch with lazy-static and better than slice-deque 0.1 by a little bit too. All tests pass for me.

Answer 3 · 2020-03-11T15:13:22.000Z

Any update on this? I'm also on Windows..