Pinned Repositories
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
g2
internlm-20b
llm-continuous-batching-benchmarks
Mixtral-8x7B
MXNet2Caffe
Convert MXNet model to Caffe model
optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
paddle
PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
PaddleFleetX
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
yangulei's Repositories
yangulei/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
yangulei/g2
yangulei/internlm-20b
yangulei/llm-continuous-batching-benchmarks
yangulei/Mixtral-8x7B
yangulei/MXNet2Caffe
Convert MXNet model to Caffe model
yangulei/optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
yangulei/paddle
yangulei/PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
yangulei/PaddleFleetX
Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep
yangulei/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
yangulei/tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
yangulei/TLCBench
Benchmark scripts for TVM
yangulei/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
yangulei/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators