Jameskry

dtwave technologyHangzhou

Jameskry's Stars

reed-lau/cute-gemm
Language:C++4514
ifromeast/cuda_learning
learning how CUDA works
Language:Cuda564
njuhope/cuda_sgemm
Language:Cuda9027
kebijuelun/Awesome-LLM-Learning
Learning Large Language Model (LLM）(大语言模型学习)
Language:Python10822
shouxieai/word_2_vec
word_2_vec
Language:Python4627
USCT-YQJ/custom_prpool_plugin
Language:C++2
QINZHAOYU/CudaSteps
基于《cuda编程-基础与实践》（樊哲勇著）的cuda学习之路。
Language:Cuda17438
ekondis/mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
Language:C++34061
zhangkai0425/SGEMM-HPC
Implementation and optimization of matrix multiplication on single CPU (HPC-THU-2023-Autumn)
Language:C8
HorizonRDK/hobot_codec
Language:C++5
DefTruth/CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Language:Cuda56457
XiaoSong9905/CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
19315
ApolloAuto/apollo
An open autonomous driving platform
Language:C++24.5k9.6k
sesmfs/onnx_quant_tool
An onnx-based quantitation tool.
Language:Python669
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python19.4k2.6k
qist/tvbox
FongMi影视、tvbox配置文件，如果喜欢，请Fork自用。使用前请仔细阅读仓库说明，一旦使用将被视为你已了解。
Language:JavaScript1.3k469
nicolaswilde/cuda-tensorcore-hgemm
Language:Cuda8518
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda691109
462630221/SampleCode
Language:C++13
openppl-public/ppl.nn
A primitive library for neural network
Language:C++1.2k207
graykode/nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
Language:Jupyter Notebook13.8k3.9k
hellogcc/100-gdb-tips
A collection of gdb tips. 100 maybe just mean many here.
Language:Go2.9k704
Bruce-Lee-LY/cuda_hgemv
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Language:Cuda204
foreverrookie/cuda-opt-samples
CUDA optimization samples including sgemm, reduce... To be continued.
Language:Cuda71
jundaf2/CUDA-INT8-GEMM
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
Language:Cuda152
triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Language:Python7.4k1.4k
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++54375
sesmfs/onnx_matcher
Using pattern matcher in onnx model to match and replace subgraphs.
Language:Python6911
cyrusbehr/YOLOv8-TensorRT-CPP
YOLOv8 TensorRT C++ Implementation
Language:C++44652
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda20343

Jameskry

Jameskry's Stars

reed-lau/cute-gemm

ifromeast/cuda_learning

njuhope/cuda_sgemm

kebijuelun/Awesome-LLM-Learning

shouxieai/word_2_vec

USCT-YQJ/custom_prpool_plugin

QINZHAOYU/CudaSteps

ekondis/mixbench

zhangkai0425/SGEMM-HPC

HorizonRDK/hobot_codec

DefTruth/CUDA-Learn-Notes

XiaoSong9905/CUDA-Optimization-Guide

ApolloAuto/apollo

sesmfs/onnx_quant_tool

vllm-project/vllm

qist/tvbox

nicolaswilde/cuda-tensorcore-hgemm

Liu-xiandong/How_to_optimize_in_GPU

462630221/SampleCode

openppl-public/ppl.nn

graykode/nlp-tutorial

hellogcc/100-gdb-tips

Bruce-Lee-LY/cuda_hgemv

foreverrookie/cuda-opt-samples

jundaf2/CUDA-INT8-GEMM

triton-inference-server/server

tpoisonooo/how-to-optimize-gemm

sesmfs/onnx_matcher

cyrusbehr/YOLOv8-TensorRT-CPP

Bruce-Lee-LY/cuda_hgemm