MARD1NO's Stars
QwenLM/qwen.cpp
C++ implementation of Qwen-LM
apple/ml-ferret
RulinShao/LightSeq
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
pytorch/torchdistx
Torch Distributed Experimental
Oneflow-Inc/faster-chatglm-6b
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
SkunkworksAI/hydra-moe
dlsyscourse/hw2
AlibabaResearch/flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
facebookresearch/fairseq2
FAIR Sequence Modeling Toolkit 2
krahets/hello-algo
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
huawei-noah/Efficient-Computing
Efficient computing methods developed by Huawei Noah's Ark Lab
Mellanox/nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
mlc-ai/mlc-ai.github.io
Azure/msccl-executor-nccl
minitorch/quizzes
Class quizzes for minitorch and an auto-grader.
irfanICMLL/structure_knowledge_distillation
The official code for the paper 'Structured Knowledge Distillation for Semantic Segmentation'. (CVPR 2019 ORAL) and extension to other tasks.
leptonai/leptonai
A Pythonic framework to simplify AI service building
punica-ai/punica
Serving multiple LoRA finetuned LLM as one
ziplab/efficient-stable-diffusion
TIGER-AI-Lab/MAmmoTH
Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" (ICLR 2024)
THUDM/MathGLM
Official Pytorch Implementation for MathGLM
FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
getgridea/gridea
✍️ A static blog writing client (一个静态博客写作客户端)
wjakob/nanobind
nanobind: tiny and efficient C++/Python bindings
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
bojone/bytepiece
更纯粹、更高压缩率的Tokenizer
openppl-public/ppl.llm.kernel.cuda
softmax1/Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
yester31/Cutlass_EX
study of cutlass