/DR3

DR3 enables users to write vectorised code using generic lambdas and filters. Switch instruction set just by changing enclosing namespace

Primary LanguageC++Apache License 2.0Apache-2.0

DR3

To get full use of the repo, you need a modern processor which has AVX512 or AVX2 instructions. If your processor only has AVX2, you need to change target instruction sets in the projects to AVX2, and don’t generate AVX512 in the projects because your machine wont run them.

The projects build with GCC, clang, IC2022 and VS2019. In visual c++ select x64 and solution configuration for IC2022, release, debug and clang

The getting started project shows some example use cases for vectors, filters and views, together with an experimental vectorised forward AAD getting option sensitivities.

The accumulate example shows some of the use cases given in the cppCon2022 talk. Additionally it gives an example of error correction in Khan accumulation

The example build an run with VS2019, clang and intel compilers. The target instruction set generated by the framework can be changed by changing the namespace. These are double and float types VecDb is pair of doubles. Uncomment the namespace and build the example.

//using namespace DRC::VecDb;

//using namespace DRC::VecD2D; //sse2 double

using namespace DRC::VecD4D; //avx2 double

//using namespace DRC::VecF8F; // avx2 float

//using namespace DRC::VecD8D; //avx512 double

//using namespace DRC::VecF16F; //avx512 float

For a machine supporting AVX512, ensure all the visual studio projects are set to use enhanced instruction set. ConfigurationProperties C++/Instruction Set /Enable Enhanced Instruction Set to ARCh:AVX512 If your machine doesnt support this, reduce to AVX2 or SSE2, and dont select a namespace in the code requiring more advanced instruction sets.

Uncomment one of the Using namespace lines select the instruction set that you wish to run
Those ending in F have float type as underlying, those ending with D have a double.

The project is set to compile using the AVX512 enhanced instruction set. The namespace selection choses the type of the intrinsics that are used to instantiate lambdas.

If your hardware does not support AVX512 chose the next level down AVX2 and avoid using namespaces DRC::VecD8D or DRC::VecF16F which will cause generation of code with instructions that your computer doesn't support.

check device manager/processor to determine what processor you have and check against web site https://ark.intel.com/content/www/us/en/ark/products/123550/intel-xeon-silver-4114-processor-13-75m-cache-2-20-ghz.html or https://www.intel.com/content/www/us/en/products/details/processors/xeon/scalable.html

The getting started project shows the useage of vectors lambdas and filters

The accumulateExample builds performance examples covered in the cppCon2022 talk. They give the user the chance to change between ICC,clang and VS2019 builds but changing the instruction set used via the using declaration.

The inverseCumNormalExample gives the performance example shown in cppCon2022, although there might be some slight perfrormance regression on one or two of the examples. Its instructive to run the examples after building with the different compilers and chosing different instruction sets for the Lambdas (via namespace).

The AVX512Dance function runs a routine which finds the max value in in array, using AVX2 and AVX512. By monitoring the power useage using something like openhardware monitor its possible to see that using the AVX512 instructions, use less energy to do the compute than the AVX2 ( on this silver4114 xeon).

VectorTest is a selection of tests using googletest.
The main library is Vectorisation. This refrence a local copy of the VCL2 library. It has a slight change to enable VCL2 to be used with the intel IC2022 compiler.

Building DR3

See docs/Build.md for instructions on how to build DR3 from source and a list of supported platforms.