Maximilianxu

I am interested in performance optimization for deep learning inference or training, code generation techniques, and compiler optimizations.

NJUNanjing

Maximilianxu's Stars

IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Language:Python44333
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.4k45
ColfaxResearch/cutlass-kernels
Language:Cuda10019
AnthonyCalandra/modern-cpp-features
A cheatsheet of modern C++ language and library features.
19.2k2k
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda76264
yalue/cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
Language:Cuda6318
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python54675
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript36.4k4.9k
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Language:Python88.7k13.9k
evanmiller/LLM-Reading-List
LLM papers I'm reading, mostly on inference and model compression
68129
ztxz16/fastllm
纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行
Language:C++3.2k321
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python35.6k4.4k
DiningFactory/panda-vpn-pro
🚁🚀 熊猫VPN（PandaVPNPro）已确定跑路！快连VPN ，小牛加速器（小牛VPN）体验不佳且价格贵。低价机场，便宜机场，平价机场，廉价机场，翻墙机场，付费机场，收费机场，高速机场，稳定机场，性价比机场，优质机场推荐。翻墙，科学上网，梯子。非永久免费梯子，非永久免费VPN，非免费机场！谷歌，油管。适用Clash，V2RAY，小火箭等代理软件。🚀🚁
92663
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Language:C++43533
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python9.7k618
Hannibal046/Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
16k1.3k
bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Language:Python8.9k490
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
Language:C++2.4k361
mlc-ai/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
Language:Jupyter Notebook3.5k219
shizhediao/ChatGPTPapers
Must-read papers, related blogs and API tools on the pre-training and tuning methods for ChatGPT.
31618
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k528
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++13.5k2.8k
nadia-polikarpova/cse291-program-synthesis
Program Synthesis Course
23735
davidhalter/jedi
Awesome autocompletion, static analysis and refactoring library for python
Language:Python5.7k502
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.2k99
exaloop/codon
A high-performance, zero-overhead, extensible Python compiler using LLVM
Language:C++14k498
stochasticai/x-stable-diffusion
Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
Language:Jupyter Notebook54834
microsoft/pai
Resource scheduling and cluster management for AI
Language:JavaScript2.6k546
pentium3/sys_reading
system paper reading notes
22812
Guangxuan-Xiao/torch-int
This repository contains integer operators on GPUs for PyTorch.
Language:Python15648

Maximilianxu

Maximilianxu's Stars

IST-DASLab/marlin

HazyResearch/ThunderKittens

ColfaxResearch/cutlass-kernels

AnthonyCalandra/modern-cpp-features

flashinfer-ai/flashinfer

yalue/cuda_scheduling_examiner_mirror

alibaba/Pai-Megatron-Patch

langgenius/dify

langchain-ai/langchain

evanmiller/LLM-Reading-List

ztxz16/fastllm

lm-sys/FastChat

DiningFactory/panda-vpn-pro

bytedance/ByteTransformer

microsoft/LoRA

Hannibal046/Awesome-LLM

bigscience-workshop/petals

openxla/xla

mlc-ai/web-stable-diffusion

shizhediao/ChatGPTPapers

FMInference/FlexGen

microsoft/onnxruntime

nadia-polikarpova/cse291-program-synthesis

davidhalter/jedi

BBuf/how-to-optim-algorithm-in-cuda

exaloop/codon

stochasticai/x-stable-diffusion

microsoft/pai

pentium3/sys_reading

Guangxuan-Xiao/torch-int