JPGoodale's Stars
microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
ggerganov/ggml
Tensor library for machine learning
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
mistralai/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
rustformers/llm
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
leejet/stable-diffusion.cpp
Stable Diffusion and Flux in pure C/C++
pytorch/glow
Compiler for Neural Network hardware accelerators
Maks-s/sd-akashic
A compendium of informations regarding Stable Diffusion (SD)
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
nod-ai/SHARK
SHARK - High Performance Machine Learning Distribution
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
getkeops/keops
KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
PacktPublishing/Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
HazyResearch/safari
Convolutions for Sequence Modeling
j2kun/mlir-tutorial
MLIR For Beginners tutorial
alibaba/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
axodox/axodox-machinelearning
This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, Midas, HED and OpenPose.
buddy-compiler/buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
NVIDIA/NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
HazyResearch/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
ToluClassics/candle-tutorial
Tutorial for Porting PyTorch Transformer Models to Candle (Rust)
PABannier/encodec.cpp
Port of Meta's Encodec in C/C++
EricLBuehler/candle-lora
Low rank adaptation (LoRA) for Candle.
nod-ai/SHARK-Turbine
Unified compiler/runtime for interfacing with PyTorch Dynamo.
simbleau/nvtx
A safe Rust FFI binding for the NVIDIA® Tools Extension SDK (NVTX).
wzh99/relay-mlir
An MLIR-based toy DL compiler for TVM Relay.