process gets stuck in D state if there is a big read (about 6 MB)
ozgrakkurt opened this issue · 4 comments
This in on kernel version 5.15 (Ubuntu) and the executor is created like this:
let executor = LocalExecutorBuilder::new(Placement::Unbound)
.blocking_thread_pool_placement(PoolPlacement::Unbound(512))
.io_memory(128 * 1024 * 1024)
.ring_depth(8 * 1024)
.detect_stalls(Some(Box::new(DefaultStallDetectionHandler {})))
.make()
.unwrap();
I am creating an ImmutableFile
and then calling read_many
on it. When I have a iovec with size more than 6MB it puts the process into D state and I can't kill it even with kill -9
command. It is stuck using 100% cpu until I reboot the system.
I am trying to split my iovecs before calling read_many
to see if this fixes the problem (not sure if read_many
al ready splits big ones internally).
EDIT: verified it is fixed if I limit each read to max 8KB and call read_many like this:
let reads = file
.read_many(
iovs,
MergedBufferLimit::NoMerging,
ReadAmplificationLimit::NoAmplification,
)
.with_concurrency(16)
.with_memory_limit(None);
Also might be unrelated but I am constantly getting File dropped while still active
warning in logs even though I tried dropping file with .close in every place I can. I also tried not calling .close but nothing makes a difference.
Minimal reproducer:
use glommio::{LocalExecutorBuilder, Placement, io::ImmutableFileBuilder};
fn main() {
let executor = LocalExecutorBuilder::new(Placement::Unbound)
.make()
.unwrap();
executor.run(async move {
let file = ImmutableFileBuilder::new("foo")
.build_existing()
.await
.unwrap();
let data = file.read_at(0, 6 * 1024 * 1024).await.unwrap();
println!("{}", data.len());
});
}
Ok I found that this happends if the read size is bigger than half of what I get from this command
cat /sys/block/.../queue/max_sectors_kb
So this should be cought inside the library (I saw this is what is used for DeviceMaxSingleRequest
)or maybe the read should be split to multiple reads internally?
Closing this as it doesn't happen for me on kernel 6.6, so will assume it is a kernel bug