/G-DBSCAN

CUDA implementation of DBSCAN. Inspired from paper https://www.sciencedirect.com/science/article/pii/S1877050913003438

Primary LanguageCuda

G-DBSCAN

CUDA implementation of DBSCAN. Inspired from paper https://www.sciencedirect.com/science/article/pii/S1877050913003438

Project structure

## Source code

  1. In gdbscan_paper.cu you can find the implementation most similar to the one proposed in the original paper.
  2. In gdbscan.cu you can find my naive implementation, which improves 1 removing some sequential loops from the host in the BFS code.
  3. In gdbscan_shifted.cu there's an attempt to change the memory access pattern in compute_degrees and compute_adjacency_list. It edits 2
  4. In gdbscan_shared.cu there's an attempt to exploit shared memory in compute_degrees and compute_adjacency_list. It edits 2

G-DBSCAN.ipynb is the code run on Google Colab to compare sklearn DBSCAN against my implementations.

Profiling

Profiling has been conducted with 200 000 10-dimensional points generated by sklearn make_blobs. The data is stored in profiling/data.txt.

  1. In profiling/paper there are execution times and profiling foreach kernel related to gdbscan_paper.cu source code
  2. In profiling/standard there are execution times and profiling foreach kernel related to gdbscan.cu source code
  3. In profiling/shifted there are execution times and profiling foreach kernel related to gdbscan_shifted.cu source code
  4. In profiling/shared there are execution times and profiling foreach kernel related to gdbscan_shared.cu source code

There are also the clustering output files, to verify clustering quality against sklearn's implementation.