geogreff

geogreff's Stars

lllyasviel/ControlNet
Let us control diffusion models!
Language:Python29.9k 217 5452.7k
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Language:Python25.4k 198 4.1k5.3k
Stability-AI/generative-models
Generative Models by Stability AI
Language:Python24.2k 257 3042.7k
pyecharts/pyecharts
🎨 Python Echarts Plotting Library
Language:Python14.8k 378 1.9k2.8k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.6k 115 1k1.2k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.3k 89 1.8k931
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.8k 62 625889
huggingface/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
Language:Python2.5k 55 740447
NVIDIA/Stable-Diffusion-WebUI-TensorRT
TensorRT Extension for Stable Diffusion Web UI
Language:Python1.9k 24 262145
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1.9k 35 325310
fengbintu/Neural-Networks-on-Silicon
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
1.8k 300 3381
microsoft/Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
Language:Python1.5k 30 182163
ZhangGe6/onnx-modifier
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
Language:JavaScript1.3k 12 103161
NVIDIA/open-gpu-doc
Documentation of NVIDIA chip/hardware interfaces
Language:C1.2k 97 092
huggingface/optimum-nvidia
Language:Python886 36 6486
mit-han-lab/tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
Language:C794 22 77130
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
Language:C++585 22 100149
mit-han-lab/mcunet
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
Language:Python460 24 2882
espnet/espnet_model_zoo
ESPnet Model Zoo
Language:Python241 13 2939
stanford-mast/nn_dataflow
Explore the energy-efficient dataflow scheduling for neural networks.
Language:Python214 20 2282
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Language:C++191 8 2431
maggiez0138/Swin-Transformer-TensorRT
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.
Language:Python161 3 1629
pulp-platform/dory
A tool to deploy Deep Neural Networks on PULP-based SoC's
Language:Python75 9 2022
keras-team/tf-keras
The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Language:Python60 6 66028
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to support the mainstream intelligent computing scenarios.
Language:Python46 6 210
SET-Scheduling-Project/SET-ISCA2023
The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.
Language:C++42 1 25
KULeuven-MICAS/DeFiNES
A framework for fast exploration of the depth-first scheduling space for DNN accelerators
Language:Python30 1 28
SheaCai/optimus
This is the implementation of the paper [Optimus: Towards Optimal Layer-Fusion on Deep Learning Processors].
Language:Python10 1 06
SET-ISCA2023/Tile-Alloc-Algorithm
The optimal Tile Allocation Algorithm in the SET paper, and the proof of optimality of the algorithm.
4 0 00
niyazed/H-Mish
Hard Mish- Memory Efficient and faster equivalent of Mish
Language:Python1 1 0