caoxiang520

caoxiang520's Stars

angry-crab/cudla_dev
Language:C++5
ifromeast/cuda_learning
learning how CUDA works
Language:Cuda18323
leimao/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Language:Cuda14914
luliyucoordinate/CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
Language:Cuda1
luliyucoordinate/cute-flash-attention
Implement Flash Attention using Cute.
Language:Cuda642
tmlfrkn/CUDAIntegratedTransformerTool
Language:C++1
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python33.2k5k
bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
Language:C++3.2k329
megvii-research/MOTRv2
[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Language:Python39248
megvii-research/MOTR
[ECCV2022] MOTR: End-to-End Multiple-Object Tracking with TRansformer
Language:Python64795
ZikangZhou/HiVT
[CVPR 2022] HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction
Language:Python660126
OpenDriveLab/UniAD
[CVPR 2023 Best Paper Award] Planning-oriented Autonomous Driving
Language:Python3.7k420
zjhellofss/KuiperInfer
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
Language:C++2.7k302
zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目，带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
Language:C++25760
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.7k169
bytedance/byteir
A model compilation solution for various hardware
Language:MLIR39343
InfiniTensor/InfiniTensor
Language:C++18530
Oneflow-Inc/trt_flash_attention
Language:C++41
Oneflow-Inc/flash-attention-v2
Fast and memory-efficient exact attention
Language:Python2
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++6k896
linClubs/BEVDet-ROS-TensorRT
BEVDet online real-time inference using CUDA, TensorRT, ROS1 & C++.
Language:C++9323
maggiez0138/Swin-Transformer-TensorRT
This project aims to explore the deployment of Swin-Transformer based on TensorRT, including the test results of FP16 and INT8.
Language:Python16329
OpenPPL/ppl.nn
A primitive library for neural network
Language:C++1.3k217
owenliang/pytorch-transformer
pytorch复现transformer
Language:Python6823
awagner8/Kvcache
Language:Python1
laugh12321/TensorRT-YOLO
TensorRT-YOLO: A high-performance, easy-to-use YOLO deployment toolkit for NVIDIA, powered by TensorRT plugins and CUDA Graph, supporting C++ and Python.
Language:C++88895
ShaYeBuHui01/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
1417
chainyo/transformers-pipeline-onnx
How to export Hugging Face's 🤗 NLP Transformers models to ONNX and use the exported model with the appropriate Transformers pipeline.
Language:Jupyter Notebook24
66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
Language:Cuda23821
Oneflow-Inc/flash-attention
Fast and memory-efficient exact attention
Language:C++51