Darth-Kronos's Stars
lyuwenyu/RT-DETR
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
microsoft/onnxruntime-genai
Generative AI extensions for onnxruntime
josephmisiti/awesome-machine-learning
A curated list of awesome Machine Learning frameworks, libraries and software.
quic/aimet-model-zoo
pytorch/TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
microsoft/onnxconverter-common
Common utilities for ONNX converters
openvinotoolkit/openvino
OpenVINOâ„¢ is an open-source toolkit for optimizing and deploying AI inference
dmlc/dlpack
common in-memory tensor structure
roboflow/inference
A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
NX-AI/vision-lstm
xLSTM as Generic Vision Backbone
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
microsoft/Olive
Olive: Simplify ML Model Finetuning, Conversion, Quantization, and Optimization for CPUs, GPUs and NPUs.
gpu-mode/lectures
Material for gpu-mode lectures
quic/aimet
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
quic/ai-hub-models
The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
merrymercy/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
NVIDIA/TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
openxla/xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
unslothai/unsloth
Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
meta-llama/llama-recipes
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.
meta-llama/llama3
The official Meta Llama 3 GitHub site
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
karpathy/llm.c
LLM training in simple, raw C/CUDA
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
aws-neuron/aws-neuron-sdk
Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
federico-busato/Modern-CPP-Programming
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All