Jin-Chuan's Stars
karpathy/llama2.c
Inference Llama 2 in one file of pure C
karpathy/llm.c
LLM training in simple, raw C/CUDA
karpathy/micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
karpathy/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
linnanwang/BLASX
a heterogeneous multiGPU level-3 BLAS library
sony/nnabla
Neural Network Libraries
Salensoft/thu-cst-cracker
清华大学计算机系课程攻略
linnanwang/superneurons-release
this is the release repository of superneurons
arbitor-project/artifact
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
inducer/loopy
A code generator for array-based code on CPUs and GPUs
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
HazyResearch/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Meinersbur/pet
Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
kungfu-origin/kungfu
Kungfu Trader
LMAX-Exchange/disruptor
High Performance Inter-Thread Messaging Library
vnpy/vnpy
基于Python的开源量化交易平台开发框架
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
tonyzhao-jt/LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework