mochi-hpc/mobject

Read errors in IOR on Polaris using Bake file backend

Closed this issue · 4 comments

IOR runs successfully on Polaris with Bake's pmem backend, but appears to generate errors on reads with Bake's file backend. I've attached my qsub script and bedrock configuration, as well as the output from running IOR.

ior_mobject.zip

carns commented

Thanks. Just to fill in some info on reproducing, I was able to trigger the problem on my laptop, with the na+sm:// transport, using the provided bedrock json file (adjusted to point to /tmp for the backend file), and a single ior process:

ior -a RADOS -t 64k -b 128k --rados.user=foo --rados.pool=bar --rados.conf=baz

carns commented

The abt_io_pread() function is in fact reading the correct number of bytes off of disk and they are also being transmitted back to the client successfully by margo_bulk_transfer(). Looks like maybe the amount transferred just isn't being propagated correctly somewhere.

carns commented

Ah, the bake_file_read_bulk() function simply isn't setting the bytes_read output argument. This is a bug in the mochi-bake repo, but I'll hold this open until I push a fix there since this has the reproducer.

carns commented

Fixed in mochi-bake in mochi-hpc/mochi-bake#58

You'll need to rebuild mobject either with mochi-bake@0.6.4 or a fresh mochi-bake@main to pick up the fix.