Jameskry's Stars
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Susan19900316/yolov5_tensorrt_int8
yolov5 tensorrt int8量化方法汇总
mytk2012/YOLOV8_INT8_TRT
DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Hongqing-work/cudnn-learning-framework
A tiny learning framework built by cudnn and cublas.
Linaom1214/TensorRT-For-YOLO-Series
tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6, YOLOv5), nms plugin support
Darth-Kronos/trt-custom-plugins
TensorRT plugins for custom operators
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
MegEngine/MegCC
MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
QianyanTech/NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
doorteeth/learn_cuda
DA-southampton/NLP_ability
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
sesmfs/360-Surround-View-CUDA-Project
10000 fps 🚀 for 360 Surround-View CUDA Solution
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
jiangjiawen/mmdeploy0.6_tensorrt8_cpp_windows
剥离mmdeploy写的tensorrt plugins,直接在tensorrt编程环境中使用。主要是mmdeploy整合太狠,给整懵了。额,平时还是语义分割用的比较多,实例分割就玩一下。
cshbli/yolov5_qat_tensorrt
YOLOv5 Quantization Aware Training with TensorRT
maggiez0138/yolov5_quant_sample
This is 8-bit quantization sample for yolov5. Both PTQ, QAT and Partial Quantization have been implemented, and present the results based on yolov5s.
jahongir7174/YOLOv8-qat
Quantization Aware Training
HeKun-NVIDIA/TensorRT-Developer_Guide_in_Chinese
zhaocc1106/my_trt_plugin
实现自己的tensorrt算子
HuangCongQing/tensorrt-plugin
实现TensorRT自定义插件(plugin)
NVIDIA/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
shouxieai/tensorRT_Pro
C++ library based on tensorrt integration
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.