Young768

NvidiaSanta Clara

Pinned Repositories

CUDALibrarySamples
CUDA Library Samples
Language:Cuda1.5k 30 192321
tensorflow
An Open Source Machine Learning Framework for Everyone
Language:C++186k 7.6k 39.8k74.2k
adaptdl
Resource-adaptive cluster scheduler for deep learning training.
Language:Python00
sys_metric
Language:Python10

Young768's Repositories

Young768/bigbird
Transformers for Longer Sequences
Young768/CUDALibrarySamples
CUDA Library Samples
Language:C++1 0
Young768/DeepLearningExamples
Deep Learning Examples
Language:Python
Young768/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
Young768/demo
Language:C++
Young768/dyang
Language:HTML
Young768/google-research
Google Research
Young768/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
Language:C++
Young768/iree_script
Language:MLIR
Young768/jax
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Language:Python
Young768/jax-test
Language:Python
Young768/jax_custom_ops_and_custom_partitioning
Language:Python
Young768/Megatron-LM
Ongoing research training transformer models at scale
Language:Python0 0
Young768/mlir-tutorial
Young768/openshmem-examples
Some miscellaneous OpenSHMEM examples
Language:C1 0
Young768/paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry leading model flop utilization rates.
Language:Python
Young768/PipeTransformer
Language:Python
Young768/profiling
some exp logs.
Language:Python2 0
Young768/profiling_
Language:Python
Young768/SHARK-dev
SHARK - High Performance Machine Learning for CPUs, GPUs, Accelerators and Heterogeneous Clusters
Language:Python0 0
Young768/tensorflow
An Open Source Machine Learning Framework for Everyone
Language:C++0 02
Young768/test-dtensor
Language:Python1
Young768/test-tf
Language:Python
Young768/training
Reference implementations of MLPerf™ training benchmarks
Language:Python
Young768/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python
Young768/triton
Development repository for the Triton language and compiler
Language:C++
Young768/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Language:C++
Young768/tutorials
PyTorch tutorials.
Language:Python1 01
Young768/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Language:Python1 0
Young768/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
Language:C++0 0

Young768

Pinned Repositories

CUDALibrarySamples

tensorflow

adaptdl

sys_metric

Young768's Repositories

Young768/bigbird

Young768/CUDALibrarySamples

Young768/DeepLearningExamples

Young768/DeepSpeed

Young768/demo

Young768/dyang

Young768/google-research

Young768/iree

Young768/iree_script

Young768/jax

Young768/jax-test

Young768/jax_custom_ops_and_custom_partitioning

Young768/Megatron-LM

Young768/mlir-tutorial

Young768/openshmem-examples

Young768/paxml

Young768/PipeTransformer

Young768/profiling

Young768/profiling_

Young768/SHARK-dev

Young768/tensorflow

Young768/test-dtensor

Young768/test-tf

Young768/training

Young768/TransformerEngine

Young768/triton

Young768/TurboTransformers

Young768/tutorials

Young768/vit-pytorch

Young768/xla