yiliu30's Stars
f/awesome-chatgpt-prompts
This repo includes ChatGPT prompt curation to use ChatGPT better.
benfred/py-spy
Sampling profiler for Python programs
Nuitka/Nuitka
Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4-3.13. You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.
stas00/ml-engineering
Machine Learning Engineering Open Book
numba/numba
NumPy aware dynamic Python compiler using LLVM
joerick/pyinstrument
🚴 Call stack profiler for Python. Shows you why your code is slow!
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
jrfonseca/gprof2dot
Converts profiling output to a dot graph.
flame/blis
BLAS-like Library Instantiation Software Framework
djhworld/simple-computer
the scott CPU from "But How Do It Know?" by J. Clark Scott
ucb-bar/gemmini
Berkeley's Spatial Array Generator
EleutherAI/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
mirage-project/mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
stdrc/modern-cmake-by-example
IPADS 实验室新人培训第二讲:CMake(2021.11.3)
intel/intel-graphics-compiler
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
codeplaysoftware/syclacademy
SYCL Academy, a set of learning materials for SYCL heterogeneous programming
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
bytedance/byteir
A model compilation solution for various hardware
Kobzol/hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
spcl/QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
mobiusml/gemlite
Simple and fast low-bit matmul kernels in CUDA / Triton
FasterDecoding/TEAL
HandH1998/QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
HabanaAI/Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk