aws/aws-ofi-nccl

NCCL RDMA expects fi_cq_data_entry, but OPX provider fills CQ with fi_cq_tagged_entry

lsavers opened this issue · 2 comments

Commit rdma: switch to untagged send/recv (12ed337) removed the use of tagged entries for NCCL protocol RDMA. Even though RDMA uses untagged send/recv operations, the CQ format attribute is initialized based on the capability flag from the provider.

if (info->caps & FI_TAGGED) {
cq_attr.format = FI_CQ_FORMAT_TAGGED;
} else {
cq_attr.format = FI_CQ_FORMAT_DATA;
}

OPX has FI_TAGGED set for the provider's capabilities, so the CQ format is being set to FI_CQ_FORMAT_TAGGED and then the CQ is filled with fi_cq_tagged_entry's by OPX. Should the spots that changed fi_cq_tagged_entry to fi_cq_data_entry still be fi_cq_tagged_entry to handle the CQ filling with that type?

static inline int handle_bounce_recv(nccl_net_ofi_rdma_device_t *device, int rail_id, struct fi_cq_data_entry *cq_entry,
static inline int handle_write_comp(struct fi_cq_data_entry *cq_entry, nccl_net_ofi_rdma_device_t *device, int rail_id)
static inline int process_completions(struct fi_cq_data_entry *cq_entry, uint64_t num_cqes, nccl_net_ofi_rdma_device_t *device,
struct fi_cq_data_entry cqe_buffers[cq_read_count];

Or should the cq_attr.format be set based on a different condition?

if (info->caps & FI_TAGGED) {

Hello,

Interestingly enough, this exact problem came up during the review for this code: #361 (comment). The conclusion was that FI_TAGGED is a "primary capability", which means it should not be enabled unless specifically requested by the application.

As Raghu noted, here is the relevant text from the Libfabric spec (https://ofiwg.github.io/libfabric/v1.22.0/man/fi_getinfo.3.html):

Capabilities may be grouped into three general categories: primary, secondary, and primary modifiers. Primary capabilities must explicitly be requested by an application, and a provider must enable support for only those primary capabilities which were selected. Primary modifiers are used to limit a primary capability, such as restricting an endpoint to being send-only. If no modifiers are specified for an applicable capability, all relevant modifiers are assumed. See above definitions for details.

My reading is that, since the plugin does not request FI_TAGGED capability, the OPX provider should not enable it.

Thanks for the prompt response and background information. I will look to update OPX to treat FI_TAGGED as a primary capability.