wenyawei's Stars
facebookresearch/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
karpathy/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
pybind/pybind11
Seamless operability between C++11 and Python
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
state-spaces/mamba
Mamba SSM architecture
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
facebookresearch/SlowFast
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
baichuan-inc/Baichuan-7B
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
fundamentalvision/BEVFormer
[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
mit-han-lab/bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
andikleen/pmu-tools
Intel PMU profiling tools
QwenLM/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
databricks/megablocks
NVIDIA/MatX
An efficient C++17 GPU numerical computing library with Python-like syntax
deepseek-ai/DeepSeek-MoE
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
NVIDIA-Merlin/Merlin
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
NVIDIA/nvbench
CUDA Kernel Benchmarking Library
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
MegEngine/mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
sunlex0717/DissectingTensorCores