Maximilianxu
I am interested in performance optimization for deep learning inference or training, code generation techniques, and compiler optimizations.
NJUNanjing
Maximilianxu's Stars
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
ColfaxResearch/cutlass-kernels
AnthonyCalandra/modern-cpp-features
A cheatsheet of modern C++ language and library features.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
evanmiller/LLM-Reading-List
LLM papers I'm reading, mostly on inference and model compression
ztxz16/fastllm
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
DiningFactory/panda-vpn-pro
🚁🚀 熊猫VPN(PandaVPNPro)已确定跑路!快连VPN ,小牛加速器(小牛VPN)体验不佳且价格贵。低价机场,便宜机场,平价机场,廉价机场,翻墙机场,付费机场,收费机场,高速机场,稳定机场,性价比机场,优质机场推荐。翻墙,科学上网,梯子。非永久免费梯子,非永久免费VPN,非免费机场!谷歌,油管。适用Clash,V2RAY,小火箭等代理软件。🚀🚁
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Hannibal046/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
mlc-ai/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
shizhediao/ChatGPTPapers
Must-read papers, related blogs and API tools on the pre-training and tuning methods for ChatGPT.
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
nadia-polikarpova/cse291-program-synthesis
Program Synthesis Course
davidhalter/jedi
Awesome autocompletion, static analysis and refactoring library for python
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
exaloop/codon
A high-performance, zero-overhead, extensible Python compiler using LLVM
stochasticai/x-stable-diffusion
Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
microsoft/pai
Resource scheduling and cluster management for AI
pentium3/sys_reading
system paper reading notes
Guangxuan-Xiao/torch-int
This repository contains integer operators on GPUs for PyTorch.