nemequ/hedley

Macro for `__declspec(empty_bases)` to enable empty base class optimization

mbeutel opened this issue · 4 comments

In C++ it is very common to rely on the empty base optimization so object size and layout isn't unnecessarily impacted by the use of inheritance-based composition. However, Visual C++ doesn't do the EBO by default. The compiler became EBO-capable in VS 2015 Update 2, but EBO is currently opt-in in order to retain binary compatibility.

For more information, cf. https://devblogs.microsoft.com/cppblog/optimizing-the-layout-of-empty-base-classes-in-vs2015-update-2-3/ .

The following macro permits enabling EBO for a given class:

#if defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000
    // Selectively enable the empty base optimization for a given type.
    // __declspec(empty_bases) was added in VC++ 2015 Update 2 and is expected to become unnecessary in the next ABI-breaking release.
 #define EMPTY_BASES __declspec(empty_bases)
#else // defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000
 #define EMPTY_BASES
#endif // defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000

Usage:

struct EMPTY_BASES StillEmpty : EmptyBase1, EmptyBase2
{
};

Would this be a suitable addition to Hedley?

Interesting. Yes, this seems like something that could possibly go into Hedley… compilers which strive for MSVC compatibility (clang and possibly icc) may support it, too. It is just an optimization; it shouldn't cause code to fail if it's not emitted (unless there is an assert or something), right?

I'm a bit hesitant because it seems like the official way forward on this is to use the no_unique_address attribute, which behaves differently (it annotates variables instead of types). I guess there isn't anything stopping us from adding both, though.

Is this something you'd be interested in submitting a PR for, or would you like me to throw something together?

Edit: also, it seems like if we do this we may also need something for __declspec(layout_version(19)).

It's an optimization and can be omitted without changing the semantics of the code. But of course it alters the class layout, so it may cause binary incompatibility. However, a problem with binary incompatibility could arise only in situations which are not supported anyway, i.e. when mixing different VC++ toolsets which are not ABI-compatible.

[[no_unique_address]] is a different matter, it applies to data members, not base classes. It is correct that EBO is the traditional workaround for the lack of [[no_unique_address]], but EBO is also useful on its own. For example, if you were to implement a tuple class, one of the easiest approaches is to do the following (https://gcc.godbolt.org/z/n2EE4m):

#include <type_traits>
#include <utility>

#if defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000
    // Selectively enable the empty base optimization for a given type.
    // __declspec(empty_bases) was added in VC++ 2015 Update 2 and is expected to become unnecessary in the next ABI-breaking release.
 #define EMPTY_BASES __declspec(empty_bases)
#else // defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000
 #define EMPTY_BASES
#endif // defined(_MSC_VER) && _MSC_VER >= 1900 && _MSC_FULL_VER >= 190023918 && _MSC_VER < 2000

template <std::size_t I, typename T, bool Inherit>
    struct tuple_leaf_base;
template <std::size_t I, typename T>
    struct tuple_leaf_base<I, T, true> : T
{
    using T::T;
    friend const T& select_index(const tuple_leaf_base& self, std::integral_constant<std::size_t, I>) { return self; }
};
template <std::size_t I, typename T>
    struct tuple_leaf_base<I, T, false>
{
    T value;
    friend const T& select_index(const tuple_leaf_base& self, std::integral_constant<std::size_t, I>) { return self.value; }
};
template <std::size_t I, typename T>
    struct EMPTY_BASES tuple_leaf : tuple_leaf_base<I, T, std::is_class_v<T> && !std::is_final_v<T>>
{
};
template <typename Is, typename... Ts>
    struct tuple_base;
template <std::size_t... Is, typename... Ts>
    struct tuple_base<std::index_sequence<Is...>, Ts...> : tuple_leaf<Is, Ts>...
{
};
template <typename... Ts>
    struct tuple : tuple_base<std::index_sequence_for<Ts...>, Ts...>
{
};

template <std::size_t I, typename... Ts>
    decltype(auto) get(const tuple<Ts...>& t)
{
    return select_index(t, std::integral_constant<std::size_t, I>{ });
}

struct empty { };

int main(void)
{
    tuple<int, empty, empty, float> t = { 3, { }, { }, 1.41 };
    int i = get<0>(t);
    static_assert(sizeof(t) == sizeof(int) + sizeof(float)); // assuming sizeof(int) == 4 here
}

It's a lot of code because I wanted to give a complete example, but you can skim over most parts and just note that tuple_leaf_base<> conditionally inherits from its element type and tuple_base<> inherits from tuple_leaf<>, and that this results in the two empty tuple elements occupying no space in the tuple thanks to EBO. With [[no_unique_address]] we could simplify the implementation of tuple_leaf<>, but we would still need EBO to obtain optimal layout.

I can submit a PR for HEDLEY_EMPTY_BASES. I'm not sure about __declspec(layout_version(19)); empty_bases is generally useful, but I'd be hard pressed to come up with a use case for layout_version(19) that isn't highly entangled with VC++ anyway. If you can recompile your code, then you don't need it; if you cannot, your ABI shouldn't rely on binary compatibility between different major versions of the toolset. Also, almost any code relying on universal binary compatibility will restrain itself to a C interface anyway.

On a slightly related note, do you have an opinion on macros for other VC++-centric attributes such as __declspec(novtable) and __declspec(selectany) and for calling conventions such as __vectorcall?

Okay, I think a HEDLEY_EMPTY_BASES would be appropriate, and at some point we can add a HEDLEY_NO_UNIQUE_ADDRESS for [[no_unique_address]].

On a slightly related note, do you have an opinion on macros for other VC++-centric attributes such as __declspec(novtable) and __declspec(selectany) and for calling conventions such as __vectorcall?

It's complicated. I'd feel a lot better about it if the macro could also be implemented for another compiler (e.g., HEDLEY_NO_THROW), but my main concern is that code should still work without it. Those are actually really great examples, I'll just go through them one by one:

  • __declspec(novtable) is pretty much purely an optimization, the code will still work if compiled somewhere without support for that declspec.
  • __declspec(selectany) is a bit more complicated. It looks like a less powerful version of GCC's weak attribute, so at least maybe we could have multiple implementations. The trouble is that code that uses the weak attribute is likely to not work without it. I'd put it in the same category as TLS (_Thread_local, __declspec(thread), etc.), and I don't want to include stuff like that in Hedley.
  • __vectorcall is more interesting. AFAIK it's processor-specific, and I'm pretty uneasy about the idea of going down that particular rabbit-hole. I would be a lot more comfortable adding something more like _Pragma("omp declare simd"), which works with standard arguments instead of requiring vector types and the function is still callable from other compilers. A separate header, or a separate project, would probably be more appropriate for processor-specific stuff like that. I'd be willing to add something to portable-snippets

Fixed by PR #22