Hongbosherlock

learning

Baidu SearchBeijing

Hongbosherlock's Stars

rasbt/LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Language:Jupyter Notebook28.1k3.2k
wdndev/llm_interview_note
主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题
Language:HTML2.9k336
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Language:Jupyter Notebook37.7k4k
HuaizhengZhang/AI-System-School
🚀 AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.
2.7k305
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Language:Python36937
miss-mumu/developer2gwy
公务员从入门到上岸，最佳程序员公考实践教程
8.2k685
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python51442
pytorch/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++1.2k486
RussWong/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
Language:Cuda18344
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python40619
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python36129
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.5k58
bytedance/decoupleQ
A quantization algorithm for LLM
Language:Cuda995
SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language:Python28425
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda23.6k2.6k
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python33.3k5.6k
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
97082
huggingface/optimum-quanto
A pytorch quantization backend for optimum
Language:Python78055
ModelTC/llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Language:Python23927
ollama/ollama
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
Language:Go91.8k7.2k
keith2018/TinyGPT
Tiny C++11 GPT-2 inference implementation from scratch
Language:C++448
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.6k173
AniZpZ/AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
Language:Python806
yihong0618/bilingual_book_maker
Make bilingual epub books Using AI translate
Language:Python7.3k1k
microsoft/TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
Language:Python35934
gpu-mode/resource-stream
GPU programming related news and material links
1.1k69
all-in-aigc/aicover
ai cover generator
Language:TypeScript1.5k291
MetaCubeX/mihomo
A simple Python Pydantic model for Honkai: Star Rail parsed data from the Mihomo API.
Language:Python15.7k2.6k
ml-explore/mlx-examples
Examples in the MLX framework
Language:Python5.9k839
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
Language:Python2.3k225

Hongbosherlock

Hongbosherlock's Stars

rasbt/LLMs-from-scratch

wdndev/llm_interview_note

mlabonne/llm-course

HuaizhengZhang/AI-System-School

FMInference/H2O

miss-mumu/developer2gwy

vllm-project/llm-compressor

pytorch/FBGEMM

RussWong/CUDATutorial

mit-han-lab/qserve

microsoft/BitBLAS

HazyResearch/ThunderKittens

bytedance/decoupleQ

SqueezeAILab/KVQuant

karpathy/llm.c

ray-project/ray

AIoT-MLSys-Lab/Efficient-LLMs-Survey

huggingface/optimum-quanto

ModelTC/llmc

ollama/ollama

keith2018/TinyGPT

DefTruth/Awesome-LLM-Inference

AniZpZ/AutoSmoothQuant

yihong0618/bilingual_book_maker

microsoft/TransformerCompression

gpu-mode/resource-stream

all-in-aigc/aicover

MetaCubeX/mihomo

ml-explore/mlx-examples

dvmazur/mixtral-offloading