soryxie's Stars
opencv/opencv
Open Source Computer Vision Library
pyecharts/pyecharts
🎨 Python Echarts Plotting Library
continue-revolution/sd-webui-segment-anything
Segment Anything for Stable Diffusion WebUI
desireevl/awesome-quantum-computing
A curated list of awesome quantum computing learning and developing resources.
nkaz001/hftbacktest
A high-frequency trading and market-making backtesting and trading bot in Python and Rust, which accounts for limit orders, queue positions, and latencies, utilizing full tick data for trades and order books, with real-world crypto market-making examples for Binance Futures
pytorch/ao
PyTorch native quantization and sparsity for training and inference
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
ray-project/kuberay
A toolkit to run Ray applications on Kubernetes
AIoT-MLSys-Lab/Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
NousResearch/DisTrO
Distributed Training Over-The-Internet
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
FloridSleeves/LLMDebugger
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
antgroup/glake
GLake: optimizing GPU memory management and IO transmission.
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
microsoft/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
Hobr/transition-ticket
Transition Ticket
kiri-art/docker-diffusers-api
Diffusers / Stable Diffusion in docker with a REST API, supporting various models, pipelines & schedulers.
Just-Prog/Bilibili_show_ticket_auto_order
gpu-mode/triton-index
Cataloging released Triton kernels.
microsoft/ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
thu-nics/DiTFastAttn
AlibabaPAI/FLASHNN
fanlai0990/CS598
Systems for GenAI
cchan/tccl
extensible collectives library in triton
Gumpest/SparseVLMs
Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Peking University and UC Berkeley.
siyan-zhao/prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
prathebaselva/FORA
FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.
mlsys-io/kv.run
A model serving framework for various research and production scenarios. Seamlessly built upon the PyTorch and HuggingFace ecosystem.