pineleen

pineleen's Stars

yunjey/pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers
Language:Python29.8k8.1k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.4k1.2k
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python1.2k63
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Language:Python48240
apple/ml-stable-diffusion
Stable Diffusion with Core ML on Apple Silicon
Language:Python16.7k926
bergkamp/video-comparison-player
🎦 Video comparison player for Mac and Windows, built using Electron
Language:Vue16311
NVIDIA/nccl-tests
NCCL Tests
Language:Cuda819230
tinygrad/open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
Language:C85775
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
1k22
dottxt-ai/outlines
Structured Text Generation
Language:Python8.2k417
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.1k103
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.4k904
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Language:Python15.8k1.5k
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language:Python4.8k378
bytedance/ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
Language:Python18850
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
Language:Python1.2k519
Lyken17/pytorch-OpCounter
Count the MACs / FLOPs of your PyTorch model.
Language:Python4.8k530
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Language:C++1.6k224
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda804126
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript45.7k6.4k
NVIDIA/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Language:C++42284
ollama/ollama
Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.
Language:Go89.5k7k
openai/simple-evals
Language:Python1.5k131
0voice/learning_mind_map
2021年【思维导图】盒子，C/C++，Golang，Linux，云原生，数据库，DPDK，音视频开发，TCP/IP，数据结构，计算机原理等
2.7k589
Mozilla-Ocho/llamafile
Distribute and run LLMs with a single file.
Language:C++19k968
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda23.3k2.6k
ztxz16/fastllm
纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行
Language:C++3.3k333
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python18.7k1.5k
megvii-research/Sparsebit
A model compression and acceleration toolbox based on pytorch.
Language:Python32440
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.5k164