WojciechMula/sse-popcount

Cannot compile popcnt-avx2-harley-seal.cpp using MSVC 2015

kimwalisch opened this issue · 6 comments

Hi Wojciech,

I use your popcnt-avx2-harley-seal algorithm in my libpopcnt.h. Unfortunately it fails to compile on Windows using a recent MSVC 2015 compiler version:

C:\Users\kim\Desktop\libpopcnt-master>nmake -f Makefile.msvc

Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation.  All rights reserved.

        cl /nologo /W3 /O2 /EHsc /D HAVE_POPCNT /arch:AVX2 /D HAVE_AVX2 test.cpp /Fotest.obj /Fetest.exe
test.cpp
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(356): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(356): error C2660: '_mm256_sub_epi8': function does not take 1 arguments
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2088: '&': illegal for union
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(357): error C2660: '_mm256_add_epi8': function does not take 1 arguments
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(358): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(369): error C2676: binary '^': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(370): error C2676: binary '&': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(370): error C2088: '&': illegal for union
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(371): error C2676: binary '^': '__m256i' does not define this operator or a conversion to a type acceptable to the predefined operator
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(423): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(424): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(425): error C3861: '_mm256_extract_epi64': identifier not found
c:\users\kim\desktop\libpopcnt-master\libpopcnt.h(426): error C3861: '_mm256_extract_epi64': identifier not found
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\cl.EXE"' : return code '0x2'
Stop.

It seems like the MSVC compiler has poor AVX2 support (e.g. the _mm256_extract_epi64 intrinsic seems to be missing)!? Have you ever tried compiling your sse-popcount project using MSVC?

Here is a link to my libpopcnt.h

Thanks,
Kim

Kim, although I never compiled this code with MSVC, I managed to write AVX2 code at work, where we use MSVC.

It seems that MSVC has no predefined operators &, ^ and | for AVX2 types. Try replace them with intrinsics mm256{and,xor,or}_si256, should help.

Thanks, do you also know a workaround for the missing mm256_extract_epi64 intrinsic on MSVC?

Best regards,
Kim

Kim, unfortunately I don't know, but I will check it for you on Monday, at work.

For now, I would try with two _mm256_extractf128_si256 (if it's supported...) followed by _mm_extract_epi64. There is a chance that impact on performance will be negligible.

I have found a pure C++ solution:

  uint64_t* total64 = (uint64_t*) &total;

  return total64[0] +
         total64[1] +
         total64[2] +
         total64[3];

The performance should be the same as your original code.

@kimwalisch

Yes. Actually, the extract intrinsic is not all that useful. I think it should only be used when you seek to extract one value from the register.

@WojciechMula I was able to compile your popcnt-avx2-harley-seal algorithm using MSVC by adding the following code:

#if defined(_MSC_VER)

/// Define missing & operator overload for __m256i type on MSVC compiler
inline __m256i operator&(const __m256i a, const __m256i b)
{
  return _mm256_and_si256(a, b);
}

/// Define missing | operator overload for __m256i type on MSVC compiler
inline __m256i operator|(const __m256i a, const __m256i b)
{
  return _mm256_or_si256(a, b);
}

/// Define missing ^ operator overload for __m256i type on MSVC compiler
inline __m256i operator^(const __m256i a, const __m256i b)
{
  return _mm256_xor_si256(a, b);
}

#endif /* _MSC_VER */