Fails to build 1.16.0 on ppc64
amckinstry opened this issue · 6 comments
Describe the bug
Hi. Debian UCX maintainer here.
UCX 1.16.0 fails to build on ppc64 / ppc64el:
https://buildd.debian.org/status/fetch.php?pkg=ucx&arch=ppc64el&ver=1.16.0%2Bds-3&stamp=1709458389&raw=0
Steps to Reproduce
Configuration here:
Setup and versions
- OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
cat /etc/issue
orcat /etc/redhat-release
+uname -a
- For Nvidia Bluefield SmartNIC include
cat /etc/mlnx-release
(the string identifies software and firmware setup)
- For RDMA/IB/RoCE related issues:
- Driver version:
rpm -q rdma-core
orrpm -q libibverbs
- or: MLNX_OFED version
ofed_info -s
- HW information from
ibstat
oribv_devinfo -vv
command
- Driver version:
- For GPU related issues:
- GPU type
- Cuda:
- Drivers version
- Check if peer-direct is loaded:
lsmod|grep nv_peer_mem
and/or gdrcopy:lsmod|grep gdrdrv
Additional information (depending on the issue)
- OpenMPI version
- Output of
ucx_info -d
to show transports and devices recognized by UCX - Configure result - config.log
- Log file - configure UCX with "--enable-logging" - and run with "UCX_LOG_LEVEL=data"
Apologies, submitted too soon.
The problem appears to be that PPC64 lacks the new ucm_bistro_lock_t in 1.16.0:
eg from x64_64.h:
/* Patching lock for other flows exclusion */
typedef struct ucm_bistro_lock {
uint8_t jmp[2]; /* jmp self or immediate next instruction */
} UCS_S_PACKED ucm_bistro_lock_t;
/**
* Helper functions to improve atomicity of function patching
*/
void ucm_bistro_patch_lock(void *dst);
There is no equivalent for PPC64.
Looks good. Testing with overnight build
It could also be interesting to further confirm by checking that ucx_info -d
runs properly, if possible.
I don't get a login to our CI/CD machines, but will add a ucx_info -d
test to the pipeline.
It all builds fine. Some existing MPI tests running at the moment.
ok mpi tests running are fine, no need to check ucx_info -d
then.