Task: Implement BFS parallelized on CPU with OpenMP
Solution: Implementation uses the idea of direction-optimized BFS. After each frontier expansion, the program decides which approach (top-down or bottom-up) would perform faster for the next round and proceeds until BFS is finished. Task: Impelement an algorithm, providing accurate distance-1 coloring of the supplied graph that runs on CPU in parallel with OpenMP.
Solution: Moving with the ideas introduced by Deveci et al., the implementation I provide performs three step iterations until each vertex is colored. Each iteration has three steps:
- Assign Colors
- Detect Conflicts
- Forbid Colors
Solution: The implementation assigns each CUDA core a vertex. Each thread is responsible for performing BFS having its assigned vertex as source node. However, memory requirement of my implementation is very high; therefore, the implementation works only for small graphs. Task: Provided a set of training and testing points on a 16-dimensional space, find the nearest neighbor among training points of each testing point.
Solution: The implementation assigns each CUDA core a testing point. Afterwards, each core iterates over the training points to find the training point with smallest euclidean distance. CUDA streams were used to overlap computation with communication. Task: Impelement CPU, GPU and Heterogeneous algorithms solving the distance-2 graph coloring problem.
Solution: We have extended the implementation of distance-1 graph coloring to distance-2 with improvements. OpenMP algorithm uses thread-private forbidden arrays and CUDA implementation uses shared block-cache to store bit vectors as forbidden arrays.