yiliu30

Talk is cheap, pick one and do it.

AI Frameworks Engineer @IntelSH

yiliu30's Stars

f/awesome-chatgpt-prompts
This repo includes ChatGPT prompt curation to use ChatGPT better.
Language:HTML113k 1.5k 015.4k
benfred/py-spy
Sampling profiler for Python programs
Language:Rust12.9k 111 369431
Nuitka/Nuitka
Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.4-3.13. You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.
Language:Python12.1k 138 2.4k651
stas00/ml-engineering
Machine Learning Engineering Open Book
Language:Python11.7k 117 30711
numba/numba
NumPy aware dynamic Python compiler using LLVM
Language:Python10k 199 5.3k1.1k
joerick/pyinstrument
🚴 Call stack profiler for Python. Shows you why your code is slow!
Language:Python6.7k 53 166235
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python6.2k 58 653522
jrfonseca/gprof2dot
Converts profiling output to a dot graph.
Language:Python3.2k 78 60385
flame/blis
BLAS-like Library Instantiation Software Framework
Language:C2.3k 78 443369
djhworld/simple-computer
the scott CPU from "But How Do It Know?" by J. Clark Scott
Language:Go1.9k 43 2158
ucb-bar/gemmini
Berkeley's Spatial Array Generator
Language:Scala818 30 184170
EleutherAI/cookbook
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
Language:Python717 13 1437
vllm-project/llm-compressor
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Language:Python694 12 9358
mirage-project/mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
Language:C++638 14 5336
stdrc/modern-cmake-by-example
IPADS 实验室新人培训第二讲：CMake（2021.11.3）
Language:C++618 5 080
intel/intel-graphics-compiler
Language:C++609 62 281158
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
Language:C++584 11 4145
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Language:Python571 15 10043
codeplaysoftware/syclacademy
SYCL Academy, a set of learning materials for SYCL heterogeneous programming
Language:HTML459 28 57102
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python420 15 6834
bytedance/byteir
A model compilation solution for various hardware
Language:MLIR378 12 2242
Kobzol/hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
Language:C++358 10 129
spcl/QuaRot
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
Language:Python287 11 4222
efeslab/Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Language:Cuda278 10 1924
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
Language:C++225 8 2617
mobiusml/gemlite
Simple and fast low-bit matmul kernels in CUDA / Triton
Language:Python144 7 411
FasterDecoding/TEAL
Language:Python96 4 91
HandH1998/QQQ
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
Language:Python89 5 248
HabanaAI/Gaudi-tutorials
Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://developer.habana.ai/
Language:Jupyter Notebook54 7 335
neuralmagic/compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
Language:Python49 10 52

yiliu30

yiliu30's Stars

f/awesome-chatgpt-prompts

benfred/py-spy

Nuitka/Nuitka

stas00/ml-engineering

numba/numba

joerick/pyinstrument

sgl-project/sglang

jrfonseca/gprof2dot

flame/blis

djhworld/simple-computer

ucb-bar/gemmini

EleutherAI/cookbook

vllm-project/llm-compressor

mirage-project/mirage

stdrc/modern-cmake-by-example

intel/intel-graphics-compiler

microsoft/T-MAC

NVIDIA/TensorRT-Model-Optimizer

codeplaysoftware/syclacademy

microsoft/BitBLAS

bytedance/byteir

Kobzol/hardware-effects-gpu

spcl/QuaRot

efeslab/Atom

bytedance/flux

mobiusml/gemlite

FasterDecoding/TEAL

HandH1998/QQQ

HabanaAI/Gaudi-tutorials

neuralmagic/compressed-tensors