Wwiit's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Stability-AI/generative-models
Generative Models by Stability AI
Mozilla-Ocho/llamafile
Distribute and run LLMs with a single file.
abseil/abseil-cpp
Abseil Common Libraries (C++)
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
intel-analytics/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
google/gemma.cpp
lightweight, standalone C++ inference engine for Google's Gemma models.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
bbycroft/llm-viz
3D Visualization of an GPT-style LLM
luban-agi/Awesome-AIGC-Tutorials
Curated tutorials and resources for Large Language Models, AI Painting, and more.
eliben/pycparser
:snake: Complete C99 parser in pure Python
RainerKuemmerle/g2o
g2o: A General Framework for Graph Optimization
CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
megvii-research/NAFNet
The state-of-the-art image restoration model without nonlinear activation functions.
pytorch/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
VirtualGL/virtualgl
Main VirtualGL repository
PacktPublishing/Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA
Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt
KnowingNothing/compiler-and-arch
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
openai/openai-gemm
Open single and half precision gemm implementations
jeffhammond/STREAM
STREAM benchmark
mlcommons/algorithmic-efficiency
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models.
OpenImageDebugger/OpenImageDebugger
An advanced in-memory image visualization plugin for GDB and LLDB on Linux, with experimental support for MacOS and Windows. Previously known as gdb-imagewatch.
BBuf/how-to-optimize-gemm
ROCm/hipBLASLt
hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
Jokeren/GPA
GPU Performance Advisor
guanrenyang/Programming-Massively-Parallel-Processors
Solution of Programming Massively Parallel Processors
carlushuang/gcnasm
amdgpu example code in hip/asm
berenger-eu/farm-sve
The Farm-SVE package provides a header that implements the ARM C language extensions (ACLE) for the ARM Scalable Vector Extension (SVE) in standard C++.
jaredhoberock/shmalloc
Dynamic __shared__ memory allocation for CUDA