datenlord/async-rdma

how should the work of safety check be done?

Opened this issue · 3 comments

Like such:

// SAFETY: POD FFI type

// TODO: check safety

And what does the 'SAFETY: POD FFI type' mean?

GTwhy commented

The comment // SAFETY: POD FFI type indicates that the use of std::mem::zeroed to initialize the qp_init_attr variable is safe in this context because the type ibv_qp_init_attr is a Plain Old Data (POD) type, and is intended for use in a Foreign Function Interface (FFI) with another language (presumably C or C++).

The term "POD" means that the type consists of plain data types, such as integers and pointers, and has no non-trivial constructors, destructors, or virtual functions. This makes it safe to use in an FFI context where the layout and memory representation of the type must be compatible with the C ABI.

The unsafe block is used to access the std::mem::zeroed function, which sets all the bytes of the object to zero. This operation is safe for POD types because it does not introduce any unexpected behavior.

GTwhy commented

I think the use of these unsafe blocks is already safe.

The safety check here is different from the regular rust program in that it is necessary to consider the unsafe behavior that may be caused by the operation of the RDMA NIC on memory. Due to limited experience during development, it may be inadequately considered, so these TODO marks are left for subsequent inspection and confirmation. We should check the code does not violate memory safety, ownership rules, or other Rust-specific concerns.

When we have a good understanding of RDMA behavior, we can examine and eliminate these TODOs from beginning to end, and explain in detail why this is sufficiently safe. Do you have any suggestions for this? Or what factors do you think may be unsafe?

yeah, safety check is important for stability. Take a recent problem encountered for example:
Screenshot 2023-04-06 at 22 47 23
and when I use &mut unsafe{ *()}, the pointer points to a new temporary memory but not the returned pointer of 'ibv_reg_mr', and I pass it to unsafe { ibv_dereg_mr(self.inner()) }. And then 'Segmentation fault (core dumped)' hanppened.

About the safety of RDMA behavior, it does require a lot of experience in rdma-core. You can always keep this issue open and we can discuss safety check here in the future.