Pinned Repositories
AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
anomalib
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
ascend-operator-challenge2
昇腾AI原生创新算子挑战赛S2 性能赛道
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
bpe-tokenizer
LLM Tokenizer with BPE algorithm
CGraph
【A common used C++ DAG framework】 一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork
ChangeFormer
[IGARSS'22]: A Transformer-Based Siamese Network for Change Detection
course
高性能并行编程与优化 - 课件
CPP
Lecture notes, projects and other materials for Course 'CS205 C/C++ Program Design' at Southern University of Science and Technology.
cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
1050705324's Repositories
1050705324/CGraph
【A common used C++ DAG framework】 一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork
1050705324/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
1050705324/anomalib
An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
1050705324/ascend-operator-challenge2
昇腾AI原生创新算子挑战赛S2 性能赛道
1050705324/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
1050705324/bpe-tokenizer
LLM Tokenizer with BPE algorithm
1050705324/ChangeFormer
[IGARSS'22]: A Transformer-Based Siamese Network for Change Detection
1050705324/course
高性能并行编程与优化 - 课件
1050705324/CPP
Lecture notes, projects and other materials for Course 'CS205 C/C++ Program Design' at Southern University of Science and Technology.
1050705324/cuda-training-series
Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
1050705324/CUDA_Freshman
1050705324/CUDATutorial
A CUDA tutorial to make people learn CUDA program from 0
1050705324/decoding_attention
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
1050705324/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
1050705324/hello-world
firstblood
1050705324/kuiperdatawhale
a log for learning
1050705324/KuiperInfer
带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
1050705324/lectures
Material for cuda-mode lectures
1050705324/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
1050705324/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
1050705324/MLCLIP-AD
1050705324/MVFA-AD
[CVPR2024 Highlight] Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
1050705324/My-Torch-Extension
A minimalist and extensible PyTorch extension for implementing custom backend operators in PyTorch.
1050705324/parallel-computing-tutorial
1050705324/RepDistiller
[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
1050705324/segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
1050705324/TensorRT_Tutorial
1050705324/test
test for github learning
1050705324/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
1050705324/Tutorial
LLM Tutorial