geogreff's Stars
lllyasviel/ControlNet
Let us control diffusion models!
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Stability-AI/generative-models
Generative Models by Stability AI
pyecharts/pyecharts
🎨 Python Echarts Plotting Library
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
huggingface/optimum
🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
NVIDIA/Stable-Diffusion-WebUI-TensorRT
TensorRT Extension for Stable Diffusion Web UI
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
fengbintu/Neural-Networks-on-Silicon
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
microsoft/Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
ZhangGe6/onnx-modifier
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
NVIDIA/open-gpu-doc
Documentation of NVIDIA chip/hardware interfaces
huggingface/optimum-nvidia
mit-han-lab/tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
sophgo/tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
mit-han-lab/mcunet
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning
espnet/espnet_model_zoo
ESPnet Model Zoo
stanford-mast/nn_dataflow
Explore the energy-efficient dataflow scheduling for neural networks.
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
maggiez0138/Swin-Transformer-TensorRT
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.
pulp-platform/dory
A tool to deploy Deep Neural Networks on PULP-based SoC's
keras-team/tf-keras
The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to support the mainstream intelligent computing scenarios.
SET-Scheduling-Project/SET-ISCA2023
The framework for the paper "Inter-layer Scheduling Space Definition and Exploration for Tiled Accelerators" in ISCA 2023.
KULeuven-MICAS/DeFiNES
A framework for fast exploration of the depth-first scheduling space for DNN accelerators
SheaCai/optimus
This is the implementation of the paper [Optimus: Towards Optimal Layer-Fusion on Deep Learning Processors].
SET-ISCA2023/Tile-Alloc-Algorithm
The optimal Tile Allocation Algorithm in the SET paper, and the proof of optimality of the algorithm.
niyazed/H-Mish
Hard Mish- Memory Efficient and faster equivalent of Mish