Multiple definition issue due to explicit function template specialization

Question

Multiple definition issue due to explicit function template specialization

Closed this issue 2 years ago · 4 comments

On AVX512 builds there are multiple definitions of some functions, e.g. avx512_qsort, see numpy/numpy#25274

Reason is that the explicit specialization of the function template is not declared inline. See cppreference:

Whether an explicit specialization of a function [...] template is inline/constexpr(since C++11) [...] is determined by the explicit specialization itself, regardless of whether the primary template is declared with that specifier

And e.g. template <> void avx512_qsort(_Float16 *arr, arrsize_t arrsize, bool hasnan) has no inline and hence could be define by multiple translation units, see the source: https://github.com/intel/x86-simd-sort/blob/d9c97379049cd549d5e386849d744c31f21a2f83/src/avx512fp16-16bit-qsort.hpp#L180

The issue seems to also apply to pretty much every explicit function specialization, so it looks like a major fixup is required searching for "template <>" and marking all those functions defined in headers inline.

Answer 1 · 2023-12-11T17:37:46.000Z

Hello @Flamefire Are you seeing this multiple definition problem outside of NumPy? The build problem reported in numpy/numpy#25274 requires a fix in NumPy and not here (making those template initializations inline does not fix it).

Answer 2 · 2023-12-12T07:51:10.000Z

@r-devulap Thanks for your reply and indeed I was referring to that numpy issue which is blocking a deployment on our new HPC partition. Although there might be an additional bug in numpy I think my argument is still valid and this is a bug in this library as I tried to show with the cppreference quote. So the issue I described causing multiple definitions happens whenever linking together 2 translation units that included x86-simd-sort/src/avx512fp16-16bit-qsort.hpp

See this reduced example:

$ cat a.cpp 
#include "foo.hpp"
$ cat b.cpp 
#include "foo.hpp"
$ cat foo.hpp 
template<typename T>
int foo(T x){ return x; }

template<>
int foo(unsigned x){ return x + 1; }
$ g++ -c -fPIC a.cpp -o a.o && g++ -c -fPIC b.cpp -o b.o && g++ -shared -o lib.so a.o b.o
/usr/bin/ld: b.o: in function `int foo<unsigned int>(unsigned int)':
b.cpp:(.text+0x0): multiple definition of `int foo<unsigned int>(unsigned int)'; a.o:a.cpp:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status

So merely including the header not even using the function causes the double definition.

Answer 3 · 2023-12-12T08:49:53.000Z

I just tested the numpy bug and marking those 2 functions inline avoided the double definition issue, i.e. with this patch:

diff --git a/src/avx512fp16-16bit-qsort.hpp b/src/avx512fp16-16bit-qsort.hpp
index 8a9a49e..1206f82 100644
--- a/src/avx512fp16-16bit-qsort.hpp
+++ b/src/avx512fp16-16bit-qsort.hpp
@@ -145,7 +145,7 @@ replace_inf_with_nan(_Float16 *arr, int64_t arrsize, int64_t nan_count)
 }
 
 template <>
-void avx512_qselect(_Float16 *arr, int64_t k, int64_t arrsize)
+inline void avx512_qselect(_Float16 *arr, int64_t k, int64_t arrsize)
 {
     if (arrsize > 1) {
         int64_t nan_count = replace_nan_with_inf(arr, arrsize);
@@ -156,7 +156,7 @@ void avx512_qselect(_Float16 *arr, int64_t k, int64_t arrsize)
 }
 
 template <>
-void avx512_qsort(_Float16 *arr, int64_t arrsize)
+inline void avx512_qsort(_Float16 *arr, int64_t arrsize)
 {
     if (arrsize > 1) {
         int64_t nan_count = replace_nan_with_inf(arr, arrsize);

However this could lead to ODR violations in the numpy case as the functions are (there) compiled in different TUs with different AVX-flags enabled and hence are not "the same under ODR" anymore.

In the commit 85fbe7d used by numpy I see many functions marked with X86_SIMD_SORT_INLINE i.e. static inline. That yields a unique definition per TU and hence the multiple differently compiled functions are not merged and hence are valid when linked together. However that doesn't work for the explicit instantiation as storage specifiers aren't allowed there. So IMO an overload would be more appropriate here instead.

Answer 4 · 2023-12-15T21:03:48.000Z

I assume I can close this now.