wm901115nwpu's Stars
google/styleguide
Style guides for Google-originated open-source projects
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
streamlit/streamlit
Streamlit — A faster way to build and share data apps.
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
rocky/python-uncompyle6
A cross-version Python bytecode decompiler
facebook/buck2
Build system, successor to Buck
zrax/pycdc
C++ python bytecode disassembler and decompiler
sgl-project/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
huggingface/safetensors
Simple, safe way to store and distribute tensors
google/maxtext
A simple, performant and scalable Jax LLM!
godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
pytorch/torchtitan
A native PyTorch Library for large model training
databricks/megablocks
rocky/python-decompile3
Python decompiler for 3.7-3.8 Stripped down from uncompyle6 so we can refactor and start to fix up some long-standing problems
bat67/pytorch-tutorials-examples-and-books
PyTorch tutorials, examples and some books I found 【不定期更新】整理的PyTorch 最新版教程、例子和书籍
NVIDIA/cccl
CUDA C++ Core Libraries
mustvlad/ChatGPT-System-Prompts
This repository contains a collection of the best system prompts for ChatGPT, a conversational AI model developed by OpenAI. Star this repository to help us reach 5,000 stars!
codecaution/Awesome-Mixture-of-Experts-Papers
A curated reading list of research in Mixture-of-Experts(MoE).
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
BobaZooba/xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
microsoft/triton-shared
Shared Middle-Layer for Triton Compilation
hpcaitech/TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
ROCm/triton
Development repository for the Triton language and compiler
Jokeren/GPA
GPU Performance Advisor
GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
ModelTC/TFMQ-DM
[CVPR 2024 Highlight] TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Hzfengsy/asplos-tvm