muse-coder's Stars
meta-llama/llama
Inference code for Llama models
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
karpathy/llm.c
LLM training in simple, raw C/CUDA
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
jadore801120/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
NVIDIA-AI-IOT/torch2trt
An easy to use PyTorch to TensorRT converter
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
hyunwoongko/transformer
Transformer: PyTorch Implementation of "Attention Is All You Need"
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Tlntin/Qwen-TensorRT-LLM
LeiWang1999/ZYNQ-NVDLA
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
accel-sim/accel-sim-framework
This is the top-level repository for the Accel-Sim framework.
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
pigirons/sgemm_hsw
This is an implementation of sgemm_kernel on L1d cache.
hsharma35/dnnweaver2
Open Source Specialized Computing Stack for Accelerating Deep Neural Networks.
Guangxuan-Xiao/torch-int
This repository contains integer operators on GPUs for PyTorch.
TRT2022/MST-plus-plus-TensorRT
:poodle: :poodle: :poodle: TensorRT 2022复赛方案: 首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化
dreamgonfly/transformer-pytorch
A PyTorch implementation of Transformer in "Attention is All You Need"
zeasa/nvdla-compiler
Sanskar777/QRS-peak-detection-in-ECG-signals-using-verilog
riple/dnnweaver2.drone