Jameskry

dtwave technologyHangzhou

Jameskry's Stars

Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda21550
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++4.7k819
Susan19900316/yolov5_tensorrt_int8
yolov5 tensorrt int8量化方法汇总
Language:Python4711
mytk2012/YOLOV8_INT8_TRT
Language:Python82
DerryHub/BEVFormer_tensorrt
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
Language:Python36360
Hongqing-work/cudnn-learning-framework
A tiny learning framework built by cudnn and cublas.
Language:Cuda213
Linaom1214/TensorRT-For-YOLO-Series
tensorrt for yolo series (YOLOv8, YOLOv7, YOLOv6, YOLOv5), nms plugin support
Language:C++807141
Darth-Kronos/trt-custom-plugins
TensorRT plugins for custom operators
Language:C++2
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python17.4k1.4k
MegEngine/MegCC
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
Language:C++46657
QianyanTech/NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
Language:Python597
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python127k25.2k
doorteeth/learn_cuda
Language:Cuda366
DA-southampton/NLP_ability
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识，包括面试题，各种基础知识，工程能力等等，提升核心竞争力
Language:Python6.3k1.1k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.1k756
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.6k867
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
959153
sesmfs/360-Surround-View-CUDA-Project
10000 fps 🚀 for 360 Surround-View CUDA Solution
Language:Jupyter Notebook9120
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda1.1k90
jiangjiawen/mmdeploy0.6_tensorrt8_cpp_windows
剥离mmdeploy写的tensorrt plugins，直接在tensorrt编程环境中使用。主要是mmdeploy整合太狠，给整懵了。额，平时还是语义分割用的比较多，实例分割就玩一下。
Language:C++2
cshbli/yolov5_qat_tensorrt
YOLOv5 Quantization Aware Training with TensorRT
Language:Python153
maggiez0138/yolov5_quant_sample
This is 8-bit quantization sample for yolov5. Both PTQ, QAT and Partial Quantization have been implemented, and present the results based on yolov5s.
Language:Jupyter Notebook9224
jahongir7174/YOLOv8-qat
Quantization Aware Training
Language:Python445
HeKun-NVIDIA/TensorRT-Developer_Guide_in_Chinese
17545
zhaocc1106/my_trt_plugin
实现自己的tensorrt算子
Language:C++21
HuangCongQing/tensorrt-plugin
实现TensorRT自定义插件(plugin)
Language:Cuda61
NVIDIA/trt-samples-for-hackathon-cn
Simple samples for TensorRT programming
Language:Python1.4k332
NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Language:C++9.3k2k
shouxieai/tensorRT_Pro
C++ library based on tensorrt integration
Language:C++2.5k530
Oneflow-Inc/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Language:C++5.8k656

Jameskry

Jameskry's Stars

Bruce-Lee-LY/cuda_hgemm

NVIDIA/cutlass

Susan19900316/yolov5_tensorrt_int8

mytk2012/YOLOV8_INT8_TRT

DerryHub/BEVFormer_tensorrt

Hongqing-work/cudnn-learning-framework

Linaom1214/TensorRT-For-YOLO-Series

Darth-Kronos/trt-custom-plugins

mlc-ai/mlc-llm

MegEngine/MegCC

QianyanTech/NBAssembler

huggingface/transformers

doorteeth/learn_cuda

DA-southampton/NLP_ability

NVIDIA/TensorRT-LLM

NVIDIA/FasterTransformer

HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese

sesmfs/360-Surround-View-CUDA-Project

BBuf/how-to-optim-algorithm-in-cuda

jiangjiawen/mmdeploy0.6_tensorrt8_cpp_windows

cshbli/yolov5_qat_tensorrt

maggiez0138/yolov5_quant_sample

jahongir7174/YOLOv8-qat

HeKun-NVIDIA/TensorRT-Developer_Guide_in_Chinese

zhaocc1106/my_trt_plugin

HuangCongQing/tensorrt-plugin

NVIDIA/trt-samples-for-hackathon-cn

NVIDIA/TensorRT

shouxieai/tensorRT_Pro

Oneflow-Inc/oneflow