COMP 633 project: kmeans in cuda
- write cuda version~~done
- read clusters from file~~done
- the choise for init clusters~~done
- the ending criteria
- data maker (maybe using python)~~done
- visiualization (maybe using python)~~done
- performance profile (running time)~~done
- replace atomicAdd with parallel reduction~~done
- optimize the program using skills in reduction
- the compute capability of Titan V is 7.0 (on phaedra)
- the compute capability of GTX 1080 Ti is 6.1 (my desktop)
- the cuda version I'm using is 9.2 (on phaedra)
- Maximum x-dimension of a grid of thread blocks: 2^31-1, starting from cc3.0
- Maximum number of threads per block: 1024, starting from cc2.0
- Maximum x- or y-dimension of a block: 1024, starting from cc2.0
- atomicAdd_system, (atomic through all CPUs and GPUs, may not need it)starting from cc6.0
- 32-bit floating-point version of atomicAdd(), starting from cc2.0
- 64-bit floating-point version of atomicAdd(), starting from cc6.0