Pinned Repositories
CUDALibrarySamples
CUDA Library Samples
tensorflow
An Open Source Machine Learning Framework for Everyone
adaptdl
Resource-adaptive cluster scheduler for deep learning training.
sys_metric
Young768's Repositories
Young768/bigbird
Transformers for Longer Sequences
Young768/CUDALibrarySamples
CUDA Library Samples
Young768/DeepLearningExamples
Deep Learning Examples
Young768/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Young768/demo
Young768/dyang
Young768/google-research
Google Research
Young768/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Young768/iree_script
Young768/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Young768/jax-test
Young768/jax_custom_ops_and_custom_partitioning
Young768/Megatron-LM
Ongoing research training transformer models at scale
Young768/mlir-tutorial
Young768/openshmem-examples
Some miscellaneous OpenSHMEM examples
Young768/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
Young768/PipeTransformer
Young768/profiling
some exp logs.
Young768/profiling_
Young768/SHARK-dev
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
Young768/tensorflow
An Open Source Machine Learning Framework for Everyone
Young768/test-dtensor
Young768/test-tf
Young768/training
Reference implementations of MLPerf™ training benchmarks
Young768/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Young768/triton
Development repository for the Triton language and compiler
Young768/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Young768/tutorials
PyTorch tutorials.
Young768/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Young768/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators