chillingche's Stars
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
ShishirPatil/gorilla
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
HaujetZhao/CapsWriter-Offline
CapsWriter 的离线版,一个好用的 PC 端的语音输入工具
ARM-software/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
huggingface/nanotron
Minimalistic large language model 3D-parallelism training
facebookresearch/MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
mlfoundations/dclm
DataComp for Language Models
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
RUC-GSAI/YuLan-Chat
YuLan: An Open-Source Large Language Model
allenai/OLMoE
OLMoE: Open Mixture-of-Experts Language Models
Cornell-RelaxML/quip-sharp
huggingface/cosmopedia
pcg-mlp/KsanaLLM
MegEngine/MegPeak
bytedance/ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
OpenGVLab/EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
quic/qidk
n-o-o-n/idp_hexagon
Hexagon processor module for IDA Pro disassembler
tlc-pack/libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
Cornell-RelaxML/qtip
HandH1998/QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
LLM360/amber-data-prep
Data preparation code for Amber 7B LLM
quic/efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
ise-uiuc/xft
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts