AuburnSounds/intel-intrinsics

Find compile-time way to detect instruction sets in GDC

p0nce opened this issue ยท 22 comments

p0nce commented

Still no way to do it? Discussion was going on in Slack and #24

Hi, I'm currently testing a change as per what was discussed back at Dconf - vector types unsupported in hardware will soon be rejected by gdc. This means that __builtin's for unsupported types won't be exposed either. This should allow detection using __traits(compiles, __vector(T)) or __traits(compiles, __builtin_vector_fun)

p0nce commented

Thanks Iain. This should allow a more precise detection indeed. :)

p0nce commented

Hello @ibuclaw
Trying to use this for the upcoming SSSE3 support (and the current SSE3 support), but with no success.
I also tried a few __traits, but the builtins seems defined in every cases.

Hi @p0nce - which version are you using? Only gdc-11 has this change in.

p0nce commented

Example where it doesn't work: https://d.godbolt.org/z/qrPs3PWTe

Though is does work for -mavx (https://d.godbolt.org/z/5saTvK3P7), but detection of SSSE3 doesn't.

Example where it doesn't work: https://d.godbolt.org/z/qrPs3PWTe

Though is does work for -mavx (https://d.godbolt.org/z/5saTvK3P7), but detection of SSSE3 doesn't.

Ah right, so the change I pointed out only started rejecting unsupported vector types. So that obviously doesn't cover new intrinsics for vectors supported by earlier hardware.

Have been doing a bit of digging around, and it looks like by default, the x86 backend pushes out all built-ins regardless of whether support is available unless a particular language hook is defined. https://github.com/gcc-mirror/gcc/blob/83faf7eacd2081a373afb6069fd923c2dc497271/gcc/langhooks.h#L552-L558

Defining that hook is enough to get correct compile-time reflection working using the examples you provided, but the knock-on effect is that this starts breaking:

import gcc.attributes;
short8 truc(short8 a, short8 b) @target("ssse3")
{
    return __builtin_ia32_pmulhrsw128(a, b);
}

Edit:
And of course, this hook is only called once lazily for each kind of ISA, so using __traits might not work the moment you use @target for the first time (assuming that it functions in the way you'd expect).

static if (__traits(compiles, __builtin_ia32_pmulhrsw128))  // false
    pragma(msg, "Yes SSSE3 support");
else
{
    short8 truc(short8 a, short8 b) @target("ssse3") // adds __builtin_ia32_pmulhrsw128 and other SSSE3 built-ins.
    {
        return __builtin_ia32_pmulhrsw128(a, b);
    }
}

static if (__traits(compiles, __builtin_ia32_pmaddubsw128))  // could be true, depending on order of evaluation.
    pragma(msg, "Yes SSSE3 support");
p0nce commented

OK but do you see any way to detect: -msse3 and -mssse3 at compile-time with GDC then?

OK but do you see any way to detect: -msse3 and -mssse3 at compile-time with GDC then?

Thinking about it, I'm probably OK with the example I gave to start breaking, adding documentation saying that what built-ins exposed by gcc.builtins are controlled by the codegen and target command-line arguments passed.

@kinke perhaps we could have a chat about adding __traits(getTargetInfo) keys for x86 features?

kinke commented

LDC has __traits(targetHasFeature) (https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature), which predates getTargetInfo. See https://run.dlang.io/is/VSeOOn for the ~140 x86 features (!).

LDC has __traits(targetHasFeature)

Maybe you could make that an alias to a getTargetInfo key then? I err on the side of not altering the dmd front-end.

p0nce commented

I'd be more than content with a __traits(targetHasFeature) equivalent in GDC. ๐Ÿ‘

I'd be more than content with a __traits(targetHasFeature) equivalent in GDC. ๐Ÿ‘

Better upstream it to dmd then. :-)

p0nce commented

Mmm... actually detection across compilers doesn't actually need to be unified for intel-intrinsics (it's there to be the unification API after all)

So how about version identifiers akin to C++ macros:

 __SSE__
__SSE2__
__SSE3__
__SSSE3__
__SSE4_1__
__SSE4_2__
__AVX__
__AVX2__

? Without DMDFE modification.

kinke commented

There's a couple of those (D_AVX[2]) for the few extra instruction sets supported by DMD; we set those for LDC as well. But as it's extremely coarse and limited to x86, I think it'd be very bad to extend these predefined versions (AVX512 is a total mess for example).

p0nce commented

Not sure what to do next. Right now SSE3 SSSE3 and the future SSE4.1 support is (or will) be implemented for GDC in intel-intrinsics, but left disabled because of this issue.

The closest I'm to detect support is by detecting avx or avx2 support in GDC, but unfortunately avx support is less prevalent than just SSE3 or SSSE3 or SSE4.1 => https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam
so it's a sort of pessimization.

gdc-trunk now puts out No SSSE3 Support. I think the 11.2 release went out just ahead of the change going in, so it'll be 11.3 that'll get it.

https://d.godbolt.org/z/3cqT6qrsY

p0nce commented

Thank you this is great news!

p0nce commented
  • Add this
static if (__traits(compiles, __builtin_ia32_comieq))
    pragma(msg, "Yes SSE support");
else
    pragma(msg, "No SSE support");

static if (__traits(compiles, __builtin_ia32_addpd))
    pragma(msg, "Yes SSE2 support");
else
    pragma(msg, "No SSE2 support");

static if (__traits(compiles, __builtin_ia32_haddps))
    pragma(msg, "Yes SSE3 support");
else
    pragma(msg, "No SSE3 support");

static if (__traits(compiles, __builtin_ia32_pmulhrsw128))
    pragma(msg, "Yes SSSE3 support");
else
    pragma(msg, "No SSSE3 support");

static if (__traits(compiles, __builtin_ia32_dpps))
    pragma(msg, "Yes SSE4.1 support");
else
    pragma(msg, "No SSE4.1 support");
  • wait for 11.3 release notes. Use a cherrypicked issue to enable this else it will yield true for all sets.
    for example issue https://issues.dlang.org/show_bug.cgi?id=21742 can be used to distinguish GCC11.1 from 11.2, and there will likely be something in 11.3 (DMDFE are same)
p0nce commented

11.3 too difficult to detect, what we can do is:

  • if VERSION >= 2100, test instruction sets
  • else, assume SSE3 SSSE3 SSE4.1 SSE4.2 and al are not available
p0nce commented

Done! We now detect GDC instruction sets. Thanks to all involved