Missing UCX RMA support for 8/16bit MPI atomics

The ArgoDSM atomics currently make use of atomic MPI operations of size 8, 16, 32 and 64 bits. The choice of MPI datatype is made through the below function or its unsigned or float equivalent.

argodsm/src/backend/mpi/mpi.cpp

Lines 64 to 94 in dc8d789

    
           /** 
        
            * @brief Returns an MPI integer type that exactly matches in size the argument given 
        
            * 
        
            * @param size The size of the datatype to be returned 
        
            * @return An MPI datatype with MPI_Type_size == size 
        
            */ 
        
           static MPI_Datatype fitting_mpi_int(std::size_t size) { 
        
           	MPI_Datatype t_type; 
        
           	using namespace argo; 
        
           	switch (size) { 
        
           	case 1: 
        
           		t_type = MPI_INT8_T; 
        
           		break; 
        
           	case 2: 
        
           		t_type = MPI_INT16_T; 
        
           		break; 
        
           	case 4: 
        
           		t_type = MPI_INT32_T; 
        
           		break; 
        
           	case 8: 
        
           		t_type = MPI_INT64_T; 
        
           		break; 
        
           	default: 
        
           		throw std::invalid_argument( 
        
           			"Invalid size (must be either 1, 2, 4 or 8)"); 
        
           		break; 
        
           	} 
        
           	return t_type; 
        
           }

OpenMPI (and most definitely MPICH as it is bundled) nowdays pushes Infiniband users towards using UCX. However, UCX does not provide full RMA support for 8 and 16 bit atomics, instead falling back to active messaging for these (if supported at all).
UCX Documentation
Related issue

Some of the ArgoDSM backend tests (atomicXchgAll, atomicXchgOne) currently fail with "unsupported datatype" when forcing the selection of the UCX osc module or when other alternatives are disabled. For both performance (avoiding active communication) and compatibility reasons, perhaps it would be better to perform at least a properly aligned 32-bit atomic operation instead of 8/16?

	/**
	* @brief Returns an MPI integer type that exactly matches in size the argument given
	*
	* @param size The size of the datatype to be returned
	* @return An MPI datatype with MPI_Type_size == size
	*/
	static MPI_Datatype fitting_mpi_int(std::size_t size) {
	MPI_Datatype t_type;
	using namespace argo;

	switch (size) {
	case 1:
	t_type = MPI_INT8_T;
	break;
	case 2:
	t_type = MPI_INT16_T;
	break;
	case 4:
	t_type = MPI_INT32_T;
	break;
	case 8:
	t_type = MPI_INT64_T;
	break;
	default:
	throw std::invalid_argument(
	"Invalid size (must be either 1, 2, 4 or 8)");
	break;
	}

	return t_type;
	}