Jin-Chuan

Jin-Chuan's Stars

karpathy/llama2.c
Inference Llama 2 in one file of pure C
Language:C17.4k2.1k
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.2k2.7k
karpathy/micrograd
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Language:Jupyter Notebook10.3k1.5k
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
Language:Python1.2k351
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python7.9k959
karpathy/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Language:Python20.1k2.5k
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Language:C++19331
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.4k123
linnanwang/BLASX
a heterogeneous multiGPU level-3 BLAS library
Language:C4511
sony/nnabla
Neural Network Libraries
Language:Python2.7k334
Salensoft/thu-cst-cracker
清华大学计算机系课程攻略
Language:C++2.4k1.1k
linnanwang/superneurons-release
this is the release repository of superneurons
Language:C++5215
arbitor-project/artifact
Language:Python2
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python8.4k1.4k
inducer/loopy
A code generator for array-based code on CPUs and GPUs
Language:Python58572
proger/accelerated-scan
Accelerated First Order Parallel Associative Scan
Language:Python1598
HazyResearch/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Language:C++27627
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda28765
Meinersbur/pet
Polyhedral Extraction Tool (source repository: http://repo.or.cz/w/pet.git)
Language:C389
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Language:Jupyter Notebook94.1k15.2k
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
Language:Python36.4k5.2k
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Language:Cuda1.3k149
IST-DASLab/gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Language:Python1.9k152
bitsandbytes-foundation/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
Language:Python6.2k623
kungfu-origin/kungfu
Kungfu Trader
Language:C++3.4k1.1k
LMAX-Exchange/disruptor
High Performance Inter-Thread Messaging Library
Language:Java17.4k3.9k
vnpy/vnpy
基于Python的开源量化交易平台开发框架
Language:Python25.5k8.8k
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.2k142
tonyzhao-jt/LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
Language:Jupyter Notebook261
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
Language:C++3.7k453