audiovention's Stars
Aider-AI/aider
aider is AI pair programming in your terminal
unslothai/unsloth
Finetune Llama 3.3, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 70% less memory
tensorflow/tfjs
A WebGL accelerated JavaScript library for training and deploying ML models.
taskflow/taskflow
A General-purpose Task-parallel Programming System using Modern C++
tiny-dnn/tiny-dnn
header only, dependency-free deep learning framework in C++14
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
flame/blis
BLAS-like Library Instantiation Software Framework
mil-tokyo/webdnn
The Fastest DNN Running Framework on Web Browser
pikvm/ustreamer
µStreamer - Lightweight and fast MJPEG-HTTP streamer
mackron/dr_libs
Audio decoding libraries for C/C++, each in a single source file.
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
romeric/Fastor
A lightweight high performance tensor algebra framework for modern C++
webgpu/webgpufundamentals
andravin/wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
siboehm/SGEMM_CUDA
Fast CUDA matrix multiplication from scratch
libocca/occa
Portable and vendor neutral framework for parallel programming on heterogeneous platforms.
G4brym/R2-Explorer
A Google Drive Interface for your Cloudflare R2 Buckets!
yzhaiustc/Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
intel/pti-gpu
Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysis on Intel(R) Processor Graphics easily
ROCm/clr
gplhegde/convolution-flavors
Implementation of convolution layer in different flavors
mattdean1/cuda
An implementation of parallel exclusive scan in CUDA
OrangeOwlSolutions/General-CUDA-programming
Expander/polylogarithm
Implementation of polylogarithms in C/C++/Fortran
unevens/avec
A little library for using SIMD instructions for x86 and ARM, wrapping Agner Fog's vectorclass for x86 and filling some of its functionality for ARM, and providing containers for aligned memory with views and interleaving/deinterleaving.
stoneberry-webgpu/stoneberry
core WebGPU shaders
blu/gemm
Musings in GEMM (General Matrix Multiplication)
yui0/ugemm
GEMM
yuzhouhe2000/Dilated-Winograd-Convolution
Parallelized Winograd 2D dilated convolution
gcp/sgemm
A collection of AVX/FMA SGEMM routines for small matrices, plus benchmark