inkinworld's Stars
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
iovisor/bcc
BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
samber/lo
💥 A Lodash-style Go library based on Go 1.18+ Generics (map, filter, contains, find...)
karpathy/minbpe
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
huggingface/text-generation-inference
Large Language Model Text Generation Inference
THUDM/CodeGeeX2
CodeGeeX2: A More Powerful Multilingual Code Generation Model
harvardnlp/annotated-transformer
An annotated implementation of the Transformer paper.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
kf-liu/The-Art-of-Linear-Algebra-zh-CN
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone", 线性代数的艺术中文版, 欢迎PR.
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
bbycroft/llm-viz
3D Visualization of an GPT-style LLM
Tencent/cherry-markdown
✨ A Markdown Editor
huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
mengjian-github/copilot-analysis
ZiyaoGeng/RecLearn
Recommender Learning with Tensorflow2.x
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
mingrammer/flog
:tophat: A fake log generator for common log formats
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
zhaozhiyong19890102/Recommender-System
推荐系统综述
zc911/MatrixSlow
A simple deep learning framework in pure python for purpose of learning in DL
hellotransformers/Natural_Language_Processing_with_Transformers
Natural Language Processing with Transformers 中译本,最权威Transformers教程
charlotteLive/pybind11-Chinese-docs
pybind11中文文档(个人翻译)
jy-yuan/KIVI
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Nanbeige/Nanbeige
KyleBing/map
路书,路线规划,高德地图 api 示例,地图信息 vue3 ts vite
run-ai/llmperf