mercury-hpc/mercury

Need an API to tell how much data could be transferred in a single bulk_transfer()

Opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
I got a HG_PROTOCOL_ERROR when the data chunk is not small enough.

# [1017487.534554] mercury->op: [warning] /work2/07555/bz186/frontera/apps/mercury/src/na/na_ofi.c:5103
 # na_ofi_cq_readerr(): fi_cq_readerr() got err: 5 (Input/output error), prov_errno: 1 (local length error)
# [1019973.528566] mercury->rma: [debug] /work2/07555/bz186/frontera/apps/mercury/src/na/na_ofi.c:4965
 # na_ofi_rma_post(): Posting RMA op (fi_writemsg, context=0x3f3c810), iov_count=1, desc[0]=0x4887770, msg_iov[0].iov_base=0x2b93d0e48010, msg_iov[0].iov_len=1679616000, addr=9, rma_iov_count=1, rma_iov[0].addr=47744078381072, rma_iov[0].len=1679616000, rma_iov[0].key=20992, d

Describe the solution you'd like
I would like to know what is the max size that can be handled in a single bulk_transfer(). Then I can cut the data and call the bulk_transfer multiple times.

Describe alternatives you've considered
Or maybe Mercury can handle this internally?

Additional context

we might be able to do that once libfabric gives us the ability to query the maximum size of RMA messages.