Pinned Repositories
Awesome-GPU
Awesome resources for GPUs
cmake-examples
Useful CMake Examples
cutlass_master
CUDA Templates for Linear Algebra Subroutines
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
LearnDLSysCourse
Learning_CUDA
MadMario-OneFlow
oneflow
OneFlow is a performance-centered and open-source deep learning framework.
paper_reading
Tools
Collect some useful code.
MARD1NO's Repositories
MARD1NO/Tools
Collect some useful code.
MARD1NO/OneshotAllreduceExample
MARD1NO/cutlass_master
CUDA Templates for Linear Algebra Subroutines
MARD1NO/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
MARD1NO/Awesome-LLM-System-Papers
MARD1NO/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
MARD1NO/cublas-test
MARD1NO/CUDALibrarySamples
CUDA Library Samples
MARD1NO/docs
Documentations for PaddlePaddle
MARD1NO/EdgeGPT
Reverse engineered API of Microsoft's Bing Chat AI
MARD1NO/FlexGen
Running large language models like OPT-175B/GPT-3 on a single GPU. Up to 100x faster than other offloading systems.
MARD1NO/Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
MARD1NO/GPTQ-triton
GPTQ inference Triton kernel
MARD1NO/INT8-Flash-Attention-FMHA-Quantization
MARD1NO/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
MARD1NO/LLMsPracticalGuide
MARD1NO/LLMSurvey
A collection of papers and resources related to Large Language Models.
MARD1NO/matxscript
The model pre-processing and post-processing framework
MARD1NO/nanoPyC
MARD1NO/nccl-tests
NCCL Tests
MARD1NO/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
MARD1NO/PaddleNLP
👑 Easy-to-use and powerful NLP library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis and 🖼 Diffusion AICG system etc.
MARD1NO/ppl.kernel.cuda
MARD1NO/PTX-ISA
CUDA PTX-ISA Document 中文翻译版
MARD1NO/QuickMathHPP
a single-header math library
MARD1NO/RedPajama-Data
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
MARD1NO/taichi-nerfs
Implementations of NeRF variants based on Taichi + PyTorch
MARD1NO/tiktoken
MARD1NO/triton
Development repository for the Triton language and compiler
MARD1NO/typst
A new markup-based typesetting system that is powerful and easy to learn.