JPGoodale

JPGoodale's Stars

microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Language:C++14.8k 247 6.7k2.9k
ggerganov/ggml
Tensor library for machine learning
Language:C++11.3k 131 4201k
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k 111 81540
mistralai/mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
Language:Jupyter Notebook8.8k 116 115761
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.7k 94 2k996
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
Language:Jupyter Notebook7.1k 59 138481
rustformers/llm
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
Language:Rust6.1k 50 231363
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Language:C++5.9k 143 971667
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.7k 109 1.2k978
leejet/stable-diffusion.cpp
Stable Diffusion and Flux in pure C/C++
Language:C++3.5k 54 275305
pytorch/glow
Compiler for Neural Network hardware accelerators
Language:C++3.2k 155 827692
Maks-s/sd-akashic
A compendium of informations regarding Stable Diffusion (SD)
1.6k 37 788
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language:Jupyter Notebook1.5k 29 17494
nod-ai/SHARK
SHARK - High Performance Machine Learning Distribution
Language:Python1.4k 41 562168
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python1.2k 17 12673
getkeops/keops
KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
Language:Python1.1k 14 31764
PacktPublishing/Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
Language:Cuda1k 27 12240
HazyResearch/safari
Convolutions for Sequence Modeling
Language:Assembly869 34 3971
j2kun/mlir-tutorial
MLIR For Beginners tutorial
Language:C++833 18 1770
alibaba/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Language:C++816 36 235160
axodox/axodox-machinelearning
This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, Midas, HED and OpenPose.
Language:C++609 15 1635
buddy-compiler/buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
Language:C++520 13 53171
NVIDIA/NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
Language:C306 11 3847
HazyResearch/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Language:C++280 15 2527
ToluClassics/candle-tutorial
Tutorial for Porting PyTorch Transformer Models to Candle (Rust)
Language:Rust253 3 313
PABannier/encodec.cpp
Port of Meta's Encodec in C/C++
Language:C++203 10 516
EricLBuehler/candle-lora
Low rank adaptation (LoRA) for Candle.
Language:Rust127 6 1713
nod-ai/SHARK-Turbine
Unified compiler/runtime for interfacing with PyTorch Dynamo.
Language:Python92 32 54546
simbleau/nvtx
A safe Rust FFI binding for the NVIDIA® Tools Extension SDK (NVTX).
Language:Rust86 5 86
wzh99/relay-mlir
An MLIR-based toy DL compiler for TVM Relay.
Language:C++53 1 16