/SimulationEngine

GPU accelerated Atoms/Molecilus Simulation Engine/Game

Primary LanguageC#

Discontinued. My attempt at PhD with collaboration with one researcher friend. Ended due to not having time and motivation due to work. Learned things that became useful later.

System requirements

Unity 2021.3.18f1 LTS

Current state, how it works

Each atom can collide with all other atoms, but we leverage the limited interaction radius of atoms (the voxel cell edge length corresponds to the maximum interaction radius). So after bitonic sort and hascode, the complexity is only $O(nlog_2(n))$ instead of $O(n^2)$

Relevant most useful algorithms/concepts

Building the Grid using Sorting

NVIDIA Particle Simulation using CUDA, 2010

An alternative approach which does not require atomic operations is to use sorting. The algorithm consists of several kernels. The first kernel “calcHash” calculates a hash value for each particle based on its cell id. In this example we simply use the linear cell id as the hash, but it may be beneficial to use other functions such the Z-order curve [8] to improve the coherence of memory accesses. The kernel stores the results to the “particleHash” array in global memory as a uint2 pair (cell hash, particle id). We then sort the particles based on their hash values. The sorting is performed using the fast radix sort provided by the CUDPP library, which uses the algorithm described in [12]. This creates a list of particle ids in cell order. In order for this sorted list to useful, we need to be able to find the start of any given cell in the sorted list. This is achieved by running another kernel “findCellStart”, which uses a thread per particle and compares the cell index of the current particle with the cell index of the previous particle in the sorted list. If the index is different, this indicates the start of a new cell, and the start address is written to another array using a scattered write. The current code also finds the index of the end of each cell in a similar way.

Bitonic sort

Bitonic sort has $O(nlog_2(n))$ complexity in all cases, it does not leverage cases where array is already close to sorted order, that is why radix sort might have better performance. Bitonic sort is however very simple to implement:

void BitonicSort(uint3 id : SV_DispatchThreadID)
{
    int index0 = id.x & ~ComparisonOffset; // index top
	int index1 = id.x | ComparisonOffset; // index bottom
    if (index1 == id.x) return;    
    uint hashcode0 = SortedAtomIndexes[index0].y;
    uint hashcode1 = SortedAtomIndexes[index1].y;
    bool shouldSwap = (hashcode0 <= hashcode1) == (bool)(DirectionChangeStride & index0);
    if (shouldSwap)
    {
        uint2 temp = SortedAtomIndexes[index0];
        SortedAtomIndexes[index0] = SortedAtomIndexes[index1];
        SortedAtomIndexes[index1] = temp;
    }
}

Prokop Hapala's RARFF

What to try next

Electron Force Field

An Electron Force Field for Simulating Large Scale Excited Electron Dynamics, 2007

zkusit implementovat electron-forcefield (eFF) ... ten je uz trochu kvantovy ... a nikdo ho na GPU nema, bylo by to dost cool jako nejaka edukativni hra... ale tam je trochu komplikace:

  1. ty elektrony se nafukuji (jsou to jakoby natlakovane oblacky/balonky, cim jsou vetsi tim maji mensi energii)
  2. jsou tam silne daleko-dosahove interakce (elekstrostatika), takze ta akcelerace kratko-dosahovych interaci (=kolizi) az tolik nepomuze

Can optimize using Fast multipole method

Optimizations

What helped

  • Major
    • For N x N body simulation
      • Limitng max interaction radius, which allows us to sort particle by cell hashcode, where cell size is max interaction radius, then we can check interactions only with neighbouring 27 cells NVIDIA Particle Simulation using CUDA, 2010
      • Increasing max hashcode, which decreased hashcode collisions
      • Distributing neighbouring 27 cells calculations into individual compute shader threads, 36-39ms vs 27ms, higher number of simpler kernels seems better
      • Decreasing hashcode collisions by increasing possible hashcode space
    • Switching to Vulkan from OpenGLCore on Linux increased Compute Shader performance
  • Noticable

What didnt help

  • Ordering actual data (not just indexes) according to hashcode didn't help. There was no performance improvemenet in interaction forces evaluation kernel, even thought in theory it should improve coherence of memory. Only additional cost of the reordering kernel. NVIDIA Particle Simulation using CUDA, 2010 Maybe poorly implemented ? There seems to be some issue with it.
  • Hashing with Z Curve did not help maybe poorly implemented ?

To try

Resources

Papers

NVIDIA Particle Simulation using CUDA, 2010

Fast 4-way parallel radix sorting on GPUs, 2009

An Electron Force Field for Simulating Large Scale Excited Electron Dynamics, 2007

Realtime Fluid Simulation on the GPU with Unity3D

Efficient Spatial Binning on the GPU, AMD Technical Report, 2009

A More Efficient Parallel Method For Neighbour Search Using CUDA, 2015

Wikipedia

Bitonic sort Wikipedia

Example/reference code

Bitonic sort example Unity compute shader

Using OpenCL in Unity

Prokop Hapala's RARFF Solver

Prokop Hapala's RARFF Solver Test

GPU Radix Sort

Other

18 - How to write a FLIP water / fluid simulation running in your browser