Pinned Repositories
3D-Machine-Learning
A resource repository for 3D machine learning
cmake-examples
Useful CMake Examples
collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
convGemm
The convGemm library performs the convolution operation using an implicit im2row or im2col over a GEMM operation with matrices in either the NHWC or NCHW format, respectively.
cs344
Introduction to Parallel Programming class code
cuda_sgemm
DimReduce
HPC-Knowledge-Library
Im2win
Represent-ML-algorithm-by-Tensor-Algebra
Machine learning algorithm, Tensor, ITensor
seth-lu's Repositories
seth-lu/Im2win
seth-lu/HPC-Knowledge-Library
seth-lu/Represent-ML-algorithm-by-Tensor-Algebra
Machine learning algorithm, Tensor, ITensor
seth-lu/convGemm
The convGemm library performs the convolution operation using an implicit im2row or im2col over a GEMM operation with matrices in either the NHWC or NCHW format, respectively.
seth-lu/3D-Machine-Learning
A resource repository for 3D machine learning
seth-lu/cmake-examples
Useful CMake Examples
seth-lu/collaborative-attention
Code for Multi-Head Attention: Collaborate Instead of Concatenate
seth-lu/cs344
Introduction to Parallel Programming class code
seth-lu/cuda_sgemm
seth-lu/DimReduce
seth-lu/EfficientConvolution
Implementation of an efficient convolution between 3D tensors and 4D tensors.
seth-lu/Fastor
A lightweight high performance tensor algebra framework for modern C++
seth-lu/how-to-optimize-gemm
seth-lu/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
seth-lu/implicit_gemm_convolution
seth-lu/ITensor
A C++ library for efficient tensor network calculations
seth-lu/laser
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
seth-lu/Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
seth-lu/LibtorchTutorials
This is a code repository for pytorch c++ (or libtorch) tutorial.
seth-lu/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
seth-lu/ls110082
Config files for my GitHub profile.
seth-lu/mtensor
A C++ Cuda Tensor Lazy Computing Library
seth-lu/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
seth-lu/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
seth-lu/PyTorch-BayesianCNN
Bayesian Convolutional Neural Network with Variational Inference based on Bayes by Backprop in PyTorch.
seth-lu/pytorch-handbook
pytorch handbook是一本开源的书籍,目标是帮助那些希望和使用PyTorch进行深度学习开发和研究的朋友快速入门,其中包含的Pytorch教程全部通过测试保证可以成功运行
seth-lu/pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers
seth-lu/splatt
The Surprisingly ParalleL spArse Tensor Toolkit.
seth-lu/visdom
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.
seth-lu/zh-google-styleguide
Google 开源项目风格指南 (中文版)