[Bug]: Compiling unit tests on aarch64 fails
penguinpee opened this issue · 9 comments
What Operating System(s) are you seeing this problem on?
Linux (aarch64)
dlib version
19.24.4
Python version
3.12
Compiler
GCC 14
Expected Behavior
Unit tests should compile without failure on aarch64
. The library compiles fine with exactly the same settings as used for compiling the unit tests. On x86_64
both library and unit tests successfully compile.
Current Behavior
Compilation fails with:
gmake[2]: Entering directory '/builddir/build/BUILD/dlib-19.24.4/dlib/test/redhat-linux-build'
[ 50%] Building CXX object examples/examples_build/CMakeFiles/logger_ex_2.dir/logger_ex_2.cpp.o
cd /builddir/build/BUILD/dlib-19.24.4/dlib/test/redhat-linux-build/examples/examples_build && /usr/bin/g++ -I/builddir/build/BUILD/dlib-19.24.4/dlib/.. -I/usr/include/ffmpeg -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -DNDEBUG -Wno-unused-but-set-variable -Wno-comment -Wno-unused-parameter -W -Wall -Wextra -Wpedantic -Werror -fdiagnostics-color=always -Wno-unused-function -Wno-strict-overflow -Wno-maybe-uninitialized -I/usr/include -DHWY_SHARED_DEFINE -I/usr/include/ffmpeg -DDLIB_JPEG_SUPPORT -DDLIB_USE_BLAS -DDLIB_USE_LAPACK -DDLIB_PNG_SUPPORT -DDLIB_WEBP_SUPPORT -DDLIB_JXL_SUPPORT -DDLIB_USE_FFMPEG -Wreturn-type -MD -MT examples/examples_build/CMakeFiles/logger_ex_2.dir/logger_ex_2.cpp.o -MF CMakeFiles/logger_ex_2.dir/logger_ex_2.cpp.o.d -o CMakeFiles/logger_ex_2.dir/logger_ex_2.cpp.o -c /builddir/build/BUILD/dlib-19.24.4/examples/logger_ex_2.cpp
In file included from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/../algs.h:122,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/matrix_exp.h:6,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/matrix.h:6,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix.h:6,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/optimization_search_strategies.h:8,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/optimization.h:9,
from /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization.h:6,
from /builddir/build/BUILD/dlib-19.24.4/examples/least_squares_ex.cpp:13:
In member function ‘dlib::memory_manager_stateless_kernel_1<double>::deallocate_array(double*)’,
inlined from ‘dlib::row_major_layout::layout<double, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, 5>::~layout()’ at /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/matrix_data_layout.h:475:42,
inlined from ‘dlib::matrix<double, 0l, 0l, dlib::memory_manager_stateless_kernel_1<char>, dlib::row_major_layout>::~matrix()’ at /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/matrix.h:1013:11,
inlined from ‘main’ at /builddir/build/BUILD/dlib-19.24.4/examples/least_squares_ex.cpp:94:49:
/builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/optimization/../matrix/../memory_manager_stateless/memory_manager_stateless_kernel_1.h:61:17: error: ‘operator delete[](void*)’ called on unallocated object ‘params’ [-Werror=free-nonheap-object]
61 | delete [] item;
| ^~~~~~~~~~~~~~
/builddir/build/BUILD/dlib-19.24.4/examples/least_squares_ex.cpp: In function ‘main’:
/builddir/build/BUILD/dlib-19.24.4/examples/least_squares_ex.cpp:94:32: note: declared here
94 | const parameter_vector params = 10*randm(3,1);
| ^~~~~~
cc1plus: all warnings being treated as errors
gmake[2]: *** [examples/examples_build/CMakeFiles/least_squares_ex.dir/build.make:79: examples/examples_build/CMakeFiles/least_squares_ex.dir/least_squares_ex.cpp.o] Error 1
gmake[2]: Leaving directory '/builddir/build/BUILD/dlib-19.24.4/dlib/test/redhat-linux-build'
gmake[1]: *** [CMakeFiles/Makefile2:2104: examples/examples_build/CMakeFiles/least_squares_ex.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
In member function ‘allocate_array’,
inlined from ‘set_max_size’ at /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/svm/../matrix/../array/array_kernel.h:438:59,
inlined from ‘push_back.constprop’ at /builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/svm/../matrix/../array/array_kernel.h:769:30:
/builddir/build/BUILD/dlib-19.24.4/dlib/../dlib/svm/../memory_manager_stateless/memory_manager_stateless_kernel_1.h:54:24: warning: argument 1 value ‘18446744073709551615’ exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
54 | return new T[size];
| ^
/usr/include/c++/14/new: In member function ‘push_back.constprop’:
/usr/include/c++/14/new:133:26: note: in a call to allocation function ‘operator new []’ declared here
133 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)
| ^
gmake[2]: Leaving directory '/builddir/build/BUILD/dlib-19.24.4/dlib/test/redhat-linux-build'
Steps to Reproduce
pushd dlib/test
CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer '
export CFLAGS
CXXFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -mbranch-protection=standard -fasynchronous-unwind-tables -fstack-clash-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer '
export CXXFLAGS
LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld-errors -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes '
export LDFLAGS
/usr/bin/cmake -S . -B redhat-linux-build -DCMAKE_C_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_CXX_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_Fortran_FLAGS_RELEASE:STRING=-DNDEBUG -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON -DCMAKE_INSTALL_DO_STRIP:BOOL=OFF -DCMAKE_INSTALL_PREFIX:PATH=/usr -DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:BOOL=ON -DDLIB_WEBP_SUPPORT:BOOL=ON
### Anything else?
Output from `cmake` (configuration):
```make
-- The C compiler identification is GNU 14.0.1
-- The CXX compiler identification is GNU 14.0.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CMake version: 3.28.3
-- Compiling dlib version: 19.24.4
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found X11: /usr/include
-- Looking for XOpenDisplay in /usr/lib64/libX11.so
-- Looking for XOpenDisplay in /usr/lib64/libX11.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Found system copy of libpng: /usr/lib64/libpng.so;/usr/lib64/libz.so
-- Found system copy of libjpeg: /usr/lib64/libjpeg.so
-- Found WebP: /usr/lib64/libwebp.so
-- Searching for JPEG XL
-- Found PkgConfig: /usr/bin/pkg-config (found version "2.1.0")
-- Checking for modules 'libjxl;libjxl_cms;libjxl_threads'
-- Found libjxl, version 0.10.2
-- Found libjxl_cms, version 0.10.2
-- Found libjxl_threads, version 0.10.2
-- Found libjxl via pkg-config in `/usr/lib64`
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Checking for module 'cblas'
-- Found cblas, version 3.12.0
-- Checking for module 'lapack'
-- Found lapack, version 3.12.0
-- Looking for cblas_ddot
-- Looking for cblas_ddot - found
-- Found BLAS and LAPACK via pkg-config
CMake Warning (dev) at /builddir/build/BUILD/dlib-19.24.4/dlib/CMakeLists.txt:652 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (Required is at least version "7.5")
-- Found CUDA, but CMake was unable to find the cuBLAS libraries that should be part of every basic CUDA install. Your CUDA install is somehow broken or incomplete. Since cuBLAS is required for dlib to use CUDA we won't use CUDA.
-- DID NOT FIND CUDA
-- Disabling CUDA support for dlib. DLIB WILL NOT USE CUDA
-- Searching for FFMPEG/LIBAV
-- Checking for modules 'libavdevice;libavfilter;libavformat;libavcodec;libswresample;libswscale;libavutil'
-- Found libavdevice, version 60.3.100
-- Found libavfilter, version 9.12.100
-- Found libavformat, version 60.16.100
-- Found libavcodec, version 60.31.102
-- Found libswresample, version 4.12.100
-- Found libswscale, version 7.5.100
-- Found libavutil, version 58.29.100
-- Found FFMPEG/LIBAV via pkg-config in `/usr/lib64`
OpenCV not found, so we won't build the webcam_face_pose_ex example.
-- Configuring done (13.6s)
-- Generating done (1.2s)
CMake Warning:
Manually-specified variables were not used by the project:
CMAKE_C_FLAGS_RELEASE
CMAKE_Fortran_FLAGS_RELEASE
CMAKE_INSTALL_DO_STRIP
INCLUDE_INSTALL_DIR
LIB_INSTALL_DIR
LIB_SUFFIX
SHARE_INSTALL_PREFIX
SYSCONF_INSTALL_DIR
-- Build files have been written to: /builddir/build/BUILD/dlib-19.24.4/dlib/test/redhat-linux-build
If above snippets are insufficient, I can provide the full log or a link to it.
At pretty much the same point the compilation also fails on s390x
.
Thanks. That's definitely a compiler bug, that code is not going to try and allocate some huge block of memory. I just pushed a cmake change to suppress it so should be good for you now. Let me know if it's not.
Yep
I applied both commits on top of 19.24.4. That fixes the test compilation for aarch64
and s390x
. However, on s390x
I see three failing tests: test_active_learning
, test_cca
and test_correlation_tracker
, and lots of "Parameter N to routine FOO was incorrect" messages in the log.
Is s390x
supported / tested at all? I know we had it built before. But that was without running tests. Not sure about the reasons, though.
It ought to work anywhere. But I don't know if anyone is using it on s390x. Maybe the tests are just overly tight numerically? I can't say given the information I've got.
You can find the full log of the first scratch build after applying the two commits fixing the build at https://kojipkgs.fedoraproject.org//work/tasks/1493/116961493/build.log
Since this is a scratch build it will be cleaned up after some time. But I can always trigger another build if needed.
After that I tried with disabling the failing tests. But that resulted in more failing tests and segfaults. I can provide you with the logs of those as well. If you prefere tracking this issue separately, let me know and I open another one.
Yeah IDK. I'm not going to be able to debug this. You should look into it and send us a PR if there really is a bug in dlib here :)
I understand. Should I find the time and the means for debugging and fixing the build issue, I certainly will provide a PR. For now we keep s390x
arch disabled at our end.