aMarry

Pinned Repositories

2019-Autumn-recruitment-experience
2019届秋招面经集合
00
ADNI_nii_tensorflow
利用ADNI数据集和标签，在tensorflow框架上使用tensorlayer接口，通过架构u-net实现海马体的分割。
Language:Python00
AiLearning
AiLearning: 机器学习 - MachineLearning - ML、深度学习 - DeepLearning - DL、自然语言处理 NLP
Language:Python00
Algorithm_Interview_Notes-Chinese
2018/2019/校招/春招/秋招/算法/机器学习(Machine Learning)/深度学习(Deep Learning)/自然语言处理(NLP)/C/C++/Python/面试笔记
Language:Python00
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python00
APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
Language:Python00
architect-awesome
后端架构师技术图谱
00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
Learn-LLVM-12
《Learn LLVM 12》的非专业个人翻译
Language:TeX10
seq2seq_for_char
Language:Python10

aMarry's Repositories

aMarry/APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GPU kernels.
Language:Python00
aMarry/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++00
aMarry/bitsandbytes
8-bit CUDA functions for PyTorch
aMarry/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
aMarry/coc.nvim
Nodejs extension host for vim & neovim, load extensions like VSCode and host language servers.
aMarry/concurrentqueue
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
aMarry/Cpp-Templates-2ed
C++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.
aMarry/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
aMarry/CUDALibrarySamples
CUDA Library Samples
aMarry/cute-gemm
aMarry/Cute-Gemm-Optimization
aMarry/cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
aMarry/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
aMarry/DeepLearningSystem
Deep Learning System core principles introduction.
aMarry/DevWeekly
每周五发布，精选优质开发者内容，包括开源项目、工具资源、技术文章等方面。
aMarry/FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
aMarry/how-to-optimize-gemm-1
row-major matmul optimization
aMarry/HPC-Learning-Notes
高性能计算相关知识学习笔记，包含学习笔记和相关知识的代码demo，在持续完善中。如果有帮助的话请Star一下，对作者帮助很大，谢谢！
aMarry/image-classification
Udacity-second-project
Language:HTML1
aMarry/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
aMarry/llama.onnx
llama/alpaca onnx models, quantization and testcase
aMarry/miniob
MiniOB is one mini database, helping developers to learn how database works.
aMarry/OneNeuralNetwork
This is a cross-chip platform collection of operators and a unified neural network library.
aMarry/onnx
Open standard for machine learning interoperability
aMarry/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
aMarry/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
aMarry/TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。
aMarry/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
aMarry/triton
Development repository for the Triton language and compiler
aMarry/tvm_mlir_learn
tvm learn