sunlex0717

GPU Performance Modelling @ Apple

AppleUK

sunlex0717's Stars

Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Language:Python168k 1.6k 2.8k44.3k
meta-llama/llama
Inference code for Llama models
Language:Python56.2k 526 9769.6k
isocpp/CppCoreGuidelines
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
Language:CSS42.7k 2k 1.2k5.4k
lllyasviel/ControlNet
Let us control diffusion models!
Language:Python30.2k 219 5512.7k
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Language:Python29.5k 341 2684k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python29.2k 242 5k4.4k
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++16.9k 146 547977
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.9k 115 1.1k1.3k
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++13.2k 193 1.5k1.6k
FMInference/FlexiGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.2k 112 82548
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python8.3k 143 3.8k1.5k
electronicarts/EASTL
EASTL stands for Electronic Arts Standard Template Library. It is an extensive and robust implementation that has an emphasis on high performance.
Language:C++8.2k 286 273937
tpn/pdfs
Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)
Language:HTML7.6k 428 101.5k
apple/corenet
CoreNet: A library for training deep neural networks
Language:Jupyter Notebook7k 65 21539
KhronosGroup/MoltenVK
MoltenVK is a Vulkan Portability implementation. It layers a subset of the high-performance, industry-standard Vulkan graphics and compute API over Apple's Metal graphics framework, enabling Vulkan applications to run on macOS, iOS and tvOS.
Language:Objective-C++4.8k 135 1k423
coala/coala
coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.
Language:Python3.6k 100 3k1.3k
neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
Language:Python3k 56 137175
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook2.9k 41 8286
hollance/neural-engine
Everything we actually know about the Apple Neural Engine (ANE)
2.1k 77 1377
ELS-RD/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
Language:Python1.7k 27 121150
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.5k 24 9128
Voine/ChatWaifu_Mobile
移动版二次元 AI 老婆聊天器
Language:C++1.2k 21 22134
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.2k 21 87144
zeux/calm
CUDA/Metal accelerated language model inference
Language:C374 9 014
MomentsInGraphics/vulkan_renderer
A toy renderer written in C using Vulkan to perform real-time ray tracing research.
Language:C349 8 224
NVIDIA/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Language:C++263 18 68552
SJTU-ACA-Lab/blue-porcelain
141 1 420
Thinklab-SJTU/awesome-ai4eda
Awesome Artificial Intelligence for Electronic Design Automation Papers.
141 8 013
g-truc/sdk
46 4 011
LouiValley/RayTracing-Tech
This is a paper list about the most important techs and some hard core knowledge about ray tracing.
19 2 01

sunlex0717

sunlex0717's Stars

Significant-Gravitas/AutoGPT

meta-llama/llama

isocpp/CppCoreGuidelines

lllyasviel/ControlNet

tatsu-lab/stanford_alpaca

vllm-project/vllm

ml-explore/mlx

Dao-AILab/flash-attention

triton-lang/triton

FMInference/FlexiGen

triton-inference-server/server

electronicarts/EASTL

tpn/pdfs

apple/corenet

KhronosGroup/MoltenVK

coala/coala

neuralmagic/deepsparse

gpu-mode/lectures

hollance/neural-engine

ELS-RD/transformer-deploy

BBuf/how-to-optim-algorithm-in-cuda

Voine/ChatWaifu_Mobile

mit-han-lab/smoothquant

zeux/calm

MomentsInGraphics/vulkan_renderer

NVIDIA/Fuser

SJTU-ACA-Lab/blue-porcelain

Thinklab-SJTU/awesome-ai4eda

g-truc/sdk

LouiValley/RayTracing-Tech