Pinned Repositories
lmql
A language for constraint-guided and efficient LLM programming.
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
tokenizers-cpp
Universal cross-platform tokenizers binding to HF and sentencepiece
exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
hello-world
to begin with
MultiAttentionTrainer
wordOctopus
zhangyuhanjc.github.io