Project1-CUDA-Flocking: A Cuda repository from LaurelinTheGold

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

Richard Chen
- LinkedIn
Tested on: Windows 11, i7-10875H @ 2.3GHz 16GB, RTX 2060 MAX-Q 6GB (Personal Computer)

Overview

This project involved computing and rendering flocks of boids all on the GPU. First was a naive O(n^2) approach that involved pairwise checking all the boids. Next, a uniform grid data structure was employed so that only close boids would be checked, reducing the amount of math needed. Lastly, the uniform grid was improved by rearranging the buffers on the GPU rather than adding a layer of indirection. This should greatly improve memory access times.

Videos and Images

100,000 Boids

10,000 Boids

Naive Implementation

Naive barely handles 50k Boids

Uniform Grid

Uniform Grid handles 50k Boids just fine

Performance

Visualize On

Visualize Off

As the number of boids increases, the naive approach does not scale
At lower boid numbers, the coherent approach incurs overhead from reshuffling arrays but with more boids, the memory indirection time saved overcomes this

Block Size

Questions