yiliu30's Stars
chatanywhere/GPT_API_free
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
HigherOrderCO/Bend
A massively parallel, high-level programming language
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
adityatelange/hugo-PaperMod
A fast, clean, responsive Hugo theme.
srush/GPU-Puzzles
Solve puzzles. Learn CUDA.
dottxt-ai/outlines
Structured Text Generation
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
turboderp/exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
iree-org/iree
A retargetable MLIR-based machine learning compiler and runtime toolkit.
turboderp/exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
flame/blis
BLAS-like Library Instantiation Software Framework
neuralmagic/sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
numba/llvmlite
A lightweight LLVM python binding for writing JIT compilers
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
srush/Triton-Puzzles
Puzzles for learning Triton
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
FMInference/H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
KEKE046/mlir-tutorial
Hands-On Practical MLIR Tutorial
fpgaminer/GPTQ-triton
GPTQ inference Triton kernel
facebookincubator/dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
Deep-Learning-Profiling-Tools/triton-viz
intel/intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
gpu-mode/triton-index
Cataloging released Triton kernels.
nod-ai/SHARK-ModelDev
Unified compiler/runtime for interfacing with PyTorch Dynamo.
NVIDIA/online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
iree-org/iree-turbine
IREE's PyTorch Frontend, based on Torch Dynamo.
pytorch-labs/triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)