REFERENCE:
https://ericdarve.github.io/cme213-spring-2021/ (stanford parallel) https://www.youtube.com/watch?v=jw-Cx3F0r0E&list=UUNFEyAFwqVREN0ljfjoYNkA&index=8&ab_channel=EricDarve
http://15418.courses.cs.cmu.edu/spring2016/lectures (CMU parallel)
https://www.youtube.com/watch?v=F620ommtjqk&list=PLAwxTw4SYaPnFKojVQrmyOGFCqHTxfdv2&ab_channel=Udacity (UC Davis, Udacity)
https://github.com/coffeebeforearch (pretty solid GPU knowledge performance engineer) https://www.youtube.com/watch?v=3xfyiWhtvZw&list=PLxNPSjHT5qvtYRVdNN1yDcdSl39uHV_sU&index=4&ab_channel=CoffeeBeforeArch (CUDA tutorial) https://github.com/CoffeeBeforeArch/cuda_programming (CUDA) https://www.youtube.com/watch?v=a0V8KpLu7EY&list=PLxNPSjHT5qvugVNYwtQwnvSQyvlbzAML3&index=7&ab_channel=CoffeeBeforeArch (MPI tutorial) https://github.com/CoffeeBeforeArch/practical_parallelism_in_cpp (MPI tutorial)
https://github.com/mpitutorial/mpitutorial https://mpitutorial.com/tutorials/mpi-scatter-gather-and-allgather/ https://mpitutorial.com/tutorials/mpi-reduce-and-allreduce/
https://github.com/NVIDIA/cuda-samples https://github.com/NVIDIA/thrust (very important CUDA library, many code examples)
https://github.com/PacktPublishing/Learn-CUDA-Programming (many deep learning examples)