Pinned Repositories
algorithm-cpp
algorithm-cpp projects
bmtrain_qlora
C-and-C-plus-plus
CMake、C/C++学习例子、OpenCV for CPP,数字图像处理DIP,深度学习CUDA加速,GPU编程,OpenGL and QT。Linux Shell 常用命令,Ubuntu and Manjaro。
cmake_examples
Practical, Easy-to-copy CMake examples
CMakeTutorial
CMake中文实战教程
compress_llama
CPlusPlusThings
C++那些事
CPM-Bee-qlora
百亿参数的中英文双语基座大模型
learning-cuda-trt
A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt
tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
jinmin527's Repositories
jinmin527/bmtrain_qlora
jinmin527/C-and-C-plus-plus
CMake、C/C++学习例子、OpenCV for CPP,数字图像处理DIP,深度学习CUDA加速,GPU编程,OpenGL and QT。Linux Shell 常用命令,Ubuntu and Manjaro。
jinmin527/compress_llama
jinmin527/CPlusPlusThings
C++那些事
jinmin527/CPM-Bee-qlora
百亿参数的中英文双语基座大模型
jinmin527/cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
jinmin527/CUDA_course
jinmin527/CUDA_Freshman
jinmin527/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
jinmin527/CudaGEMMOptimization
jinmin527/CudaProgramming
jinmin527/cutests
Build tutorail for cuda
jinmin527/cutlass-cute-sample
jinmin527/cutlass-kernels
jinmin527/Cutlass_EX
study of cutlass
jinmin527/cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
jinmin527/CutlassProgramming
jinmin527/Efficient-LLM-Inferencing-on-GPUs
Penn CIS 5650 (GPU Programming and Architecture) Final Project
jinmin527/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
jinmin527/GEMM_MMA
Optimize GEMM with tensorcore step by step
jinmin527/InfLLM
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
jinmin527/lectures
Material for cuda-mode lectures
jinmin527/libflash_attn
Standalone Flash Attention v2 kernel without libtorch dependency
jinmin527/LLMsNineStoryDemonTower
【LLMs九层妖塔】分享 LLMs在自然语言处理(ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等)、信息检索(langchain)、语言合成、语言识别、多模态等领域(Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等)等 实战与经验。
jinmin527/MathExperiment
数学实验 2023 春
jinmin527/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
jinmin527/PyTorch-Linear-Operator-CUDA
A simple demo in writing pytorch linear operator by CUDA extension
jinmin527/query_doc_topk
jinmin527/SwiftTransformer
High performance Transformer implementation in C++.
jinmin527/tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版,具有教学意义