nvortexVc

Direct Biot-Savart solver for 2D and 3D vortex blobs accelerated with Vc

Build and run

This should be easy on a Linux host, but first you will need to build and install Vc.

git clone https://github.com/VcDevel/Vc.git
cd Vc
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF ..
make -j 4
sudo make install

Now, back in this directory you can make all four binaries (2D and 3D, with and without OpenMP).

make

Run the code with one argument: the number of particles to use.

nvort2d -n=10000
nvort2domp -n=10000
nvort3d -n=10000
nvort3domp -n=10000

Performance

Performance of this algorithm accelerated with Vc is slightly better than from my SimdNBodyKernels code which uses Intel's ispc compiler, and the code here is easier to create and understand. A little care must be taken to pad arrays with proper values (padded particle radius must be nonzero), lest the inner kernel try to divide by zero when using the last vector register set of particles.

On a 16-core Intel i9-7960X Skylake CPU, with a large-enough problem, the nvortex3domp code exceeded 1 TFlop/s - the first time I've ever seen that happen on a single CPU.

The table below gives peak performance for the 3D case (with no grads) for combinations of x86 (possibly auto-vectorized) vs. Vc, and single-thread vs. OpenMP multithreaded. Each was run on a Fedora or Ubuntu machine and compiled with GCC 7.2 or 7.3. The numbers are all GFlop/s and the problem size was 30000.

GFlop/s	x86 serial	x86 OMP	Vc serial	Vc OMP	CPU peak	% of theoretical
Intel i3-8109U	5.2	11.5	69.6	145	230.4	63%
Intel i7-7500U	5.1	11.0	62.2	121	204.8	59%
Intel i7-5960X	4.3	34.0	44.0	369	768	48%
Intel i9-7960X	4.3	64.5	58.5	813	1434	57%
AMD Ryzen 7 2700X	4.5	47.3	42.9	289	499.2	58%

Budhyant/nvortexVc

nvortexVc

Build and run

Performance