Pinned Repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
wm901115nwpu's Repositories
wm901115nwpu/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
wm901115nwpu/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
wm901115nwpu/DeepLearningSystem
Deep Learning System core principles introduction.
wm901115nwpu/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
wm901115nwpu/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
wm901115nwpu/DeepSpeedExamples
Example models using DeepSpeed
wm901115nwpu/depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
wm901115nwpu/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
wm901115nwpu/learn_pytorch2.0
wm901115nwpu/LLaMA-Factory
Unify Efficient Fine-tuning of 100+ LLMs
wm901115nwpu/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
wm901115nwpu/mamba
wm901115nwpu/Megatron-LM
Ongoing research training transformer models at scale
wm901115nwpu/mmcv
OpenMMLab Computer Vision Foundation
wm901115nwpu/mmdeploy
OpenMMLab Model Deployment Framework
wm901115nwpu/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
wm901115nwpu/NeMo
NeMo: a framework for generative AI
wm901115nwpu/neural-compressor
Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.
wm901115nwpu/OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
wm901115nwpu/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
wm901115nwpu/pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
wm901115nwpu/TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
wm901115nwpu/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
wm901115nwpu/triton
Development repository for the Triton language and compiler
wm901115nwpu/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
wm901115nwpu/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
wm901115nwpu/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
wm901115nwpu/workshops
This is a repository for all workshop related materials.
wm901115nwpu/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
wm901115nwpu/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)