Pinned Repositories
alpa
Training and serving large-scale neural networks
ase_riscv_gem5_sim
RISCV Gem5 simulator flow for Architetture dei Sistemi di Elaborazione
awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
Awesome-Efficient-Training
A collection of research papers on efficient training of DNNs
awesome-emdl
Embedded and mobile deep learning research resources
awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
coconet
code-samples
Source code examples from the Parallel Forall Blog
cuda-unified-memory-test
Yu-gyoung-Yun's Repositories
Yu-gyoung-Yun/alpa
Training and serving large-scale neural networks
Yu-gyoung-Yun/ase_riscv_gem5_sim
RISCV Gem5 simulator flow for Architetture dei Sistemi di Elaborazione
Yu-gyoung-Yun/awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
Yu-gyoung-Yun/awesome-emdl
Embedded and mobile deep learning research resources
Yu-gyoung-Yun/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Yu-gyoung-Yun/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
Yu-gyoung-Yun/code-samples
Source code examples from the Parallel Forall Blog
Yu-gyoung-Yun/cuptisamples
NVIDIA CUPTI samples mirror.
Yu-gyoung-Yun/cutlass
CUDA Templates for Linear Algebra Subroutines
Yu-gyoung-Yun/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Yu-gyoung-Yun/DeepSpeedExamples
Example models using DeepSpeed
Yu-gyoung-Yun/DL_Compiler_and_Hardware
Yu-gyoung-Yun/FasterTransformer
Transformer related optimization, including BERT, GPT
Yu-gyoung-Yun/Hands-On-GPU-Programming-with-Python-and-CUDA
Hands-On GPU Programming with Python and CUDA, published by Packt
Yu-gyoung-Yun/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Yu-gyoung-Yun/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Yu-gyoung-Yun/LLMSys-PaperList
LLM Systems Paper List
Yu-gyoung-Yun/lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
Yu-gyoung-Yun/ML-Hardware-Collections
News and Paper Collections for Machine Learning Hardware
Yu-gyoung-Yun/ml4se
A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering
Yu-gyoung-Yun/scale-sim-v2
Repository to host and maintain scale-sim-v2 code
Yu-gyoung-Yun/tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
Yu-gyoung-Yun/tensorflow
An Open Source Machine Learning Framework for Everyone
Yu-gyoung-Yun/tensorflow-alpa
Yu-gyoung-Yun/TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
Yu-gyoung-Yun/torch-ccl
oneCCL Bindings for Pytorch*
Yu-gyoung-Yun/tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
Yu-gyoung-Yun/tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Yu-gyoung-Yun/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
Yu-gyoung-Yun/Yu-gyoung-Yun.github.io