2 questions, on aligned memory allocations and AVX/AVX2

Question

2 questions, on aligned memory allocations and AVX/AVX2

OhSoGood opened this issue 7 years ago · 1 comments

Hi,

Thanks for your very interesting project.

I found it as I'm trying to create 16-byte memory aligned arrays in .NET - arrays of floats that I have to pass to an unmanaged lib which uses AVX/AVX2 processing and that I wish to reuse afterwards as native arrays without copying data. Their size would be between 1024 and 4096 floats, so not so big. I have not started looking into your code, but is there a good/easy way to do this? Could I use your lib or parts of it for this?

My 2nd question is close but likely a little off-topic with your (excellent) memory allocator. Learning SIMD in .NET (Vector and co), I was surprised to see that very few AVX/AVX2 operations are available in Vector. Is it me who missed AVX-equivalent methods such as: sin, cos, ln, multiply vector by scalar?

What's your opinion on this?

Answer 1 · 2018-04-23T16:35:03.000Z

Hi thanks for trying it out. On 64-bit machines jemalloc aligns floats and other values on a minimum 16-byte boundary. This is the recommendation of the C11 standard and is what jemalloc follows unless you manually compile it with different options. So elements of a FixedBuffer<float> would already be aligned for AVX operations.

AFAIK there;s no way to specify manual alignment of a managed array of floats or other primitive structs in .NET. The CLR apparently doesn't support this. Objects on the Large Object Heap should be aligned to 16-bytes but this isn't specified behavior and may change. With user-defined structs you can use the StructLayout attribute e.g.

[StructLayout(LayoutKind.Sequential, Pack=16)]
struct MyFloat
{
   public float f1;
}

which should make arrays of MyFloats be aligned at 16-byte boundaries (8 + 8 padding), For interoperating with unmanaged code, Span<T> can wrap both managed and unmanaged types so you should probably write your interop API to accept Span<T> parameters for maximum flexibility.

There are actually 2 types of Vectors in NET: fixed-size and machine-sized. Vector<T> refers to machine-sized vectors that are sized depending on the type (T) and what the underlying hardware supports. The list of Vector<T> operations is here: https://msdn.microsoft.com/en-us/library/dn858385(v=vs.111).aspx?f=255&mspperror=-2147217396#Anchor_4

There are also Vector2, Vector3, Vector4 types (and Matrix and Plane and others) that are fixed float SIMD-accelerated vectors of size 2,3,4 etc,. These types have more math operations defined:
https://docs.microsoft.com/en-us/dotnet/api/system.numerics.vector2?view=netcore-2.1#methods-

SIMD support is under active development in .NET and is constantly being improved e.g https://github.com/dotnet/corefx/issues/22940 tracks the ongoing discussion on adding intrinsics and other features to .NET Vectors.

If you decide to try out FixedBuffer<float> or other types let me know how it works.