ddoddii/Multicore-GPU-Programming

[CSI4119] Multicore GPU Programming

C++

Multicore GPU Programming

The repository covers a wide range of topics, each aimed at improving efficiency and performance in GPU programming. Here’s a detailed look at what I learned:

Theory

Theme	Post
Basic Parallel Architectures	Basic Parallel Architectures에 대해 알아보자
Thread Programming	c++로 알아본 쓰레드 프로그래밍
Thread Management	멀티쓰레드에서 쓰레드 간 작업을 어떻게 균일하게 분할할까?
Matrix Multiplication (multi-threaded)	멀티쓰레드에서 행렬 연산(matmul) 성능 증가시키는 방법들
OpenMP	멀티쓰레딩을 편리하게 해주는 OpenMP 사용법
Graph Processing	그래프 구조를 더 효율적으로 저장하는 방법들
Prefix sum	Prefix Sum : 효율적인 연산을 위한 가이드
CUDA Programming Intro	CUDA 프로그래밍 기초
CPU-GPU communication and thread indexing	CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법
CUDA thread hierarchy, memory hierarchy, GPU cache structure	CUDA와 Nvidia GPU 아키텍처: 스레드 계층, 메모리 계층 및 GPU 캐시 구조 이해하기
CUDA memories : registers, shared memory, global memory	CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리

Hands on Assignment

Assignment	Description	Link
Assignment #1	A Simple Filter on 1D Array	link
Assignment #2	Hash table locking	link
Assignment #3	Matrix Multiplication	link
Assignment #4	Matrix Multiplication using CUDA	link
Assignment #5	Sum Reduction	link
Assignment #6	CUDA Application of DNN	link

Content Breakdown

Basic Parallel Architectures

Post: Basic Parallel Architectures에 대해 알아보자
Description: This section introduces the fundamental concepts of parallel architectures, laying the groundwork for more advanced topics.

Thread Programming

Post: c++로 알아본 쓰레드 프로그래밍
Description: Dive into thread programming with C++, understanding how to create and manage threads effectively.

Thread Management

Post: 멀티쓰레드에서 쓰레드 간 작업을 어떻게 균일하게 분할할까?
Description: Learn strategies for evenly distributing tasks among threads in a multithreaded environment to maximize performance.

Matrix Multiplication (multi-threaded)

Post: 멀티쓰레드에서 행렬 연산(matmul) 성능 증가시키는 방법들
Description: Explore methods to optimize matrix multiplication operations using multithreading techniques.

OpenMP

Post: 멀티쓰레딩을 편리하게 해주는 OpenMP 사용법
Description: Get acquainted with OpenMP, a powerful tool that simplifies multithreading and parallel programming.

Graph Processing

Post: 그래프 구조를 더 효율적으로 저장하는 방법들
Description: Discover efficient ways to store and process graph structures, crucial for handling complex data relationships.

Prefix Sum

Post: Prefix Sum : 효율적인 연산을 위한 가이드
Description: Gain a comprehensive understanding of the prefix sum algorithm and its applications in efficient computation.

CUDA 101

Post : CUDA 프로그래밍 기초
Description : This section provides an introduction to CUDA programming, designed for those new to GPU programming. This post includes the basics of CUDA, including how to set up your development environment, write and compile your first CUDA program.

CPU-GPU communication and thread indexing

Post : CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법
Description : This section provides detailed explanation about the hierarchical structure of CUDA threads, including grids, blocks, and threads. This post includes calculating global thread index through thread indexing and some example code about image processing.

CUDA thread hierarchy, memory hierarchy, GPU cache structure

Post : CUDA와 Nvidia GPU 아키텍처: 스레드 계층, 메모리 계층 및 GPU 캐시 구조 이해하기
Description : This section delves into the advanced aspects of CUDA and Nvidia GPU architecture, including the hierarchical organization of threads, the different levels of memory, and the structure of GPU caches.

CUDA memories : registers, shared memory, global memory

Post : CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리
Decsription : This section explores the different types of memory in CUDA, focusing on registers, shared memory, and global memory. his post delves into the characteristics of each memory type and provides strategies for effectively utilizing them to enhance the efficiency of CUDA kernels.