Pinned Repositories
accfft
A Massively Parallel FFT Library for CPU/GPU
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
AutoTest
awesome-machine-learning-cn
机器学习资源大全中文版,包括机器学习领域的框架、库以及软件
baidu-allreduce
EZLippi.github.io
这是我的个人网站的源码,欢迎fork。
fastmoe
A fast MoE impl for PyTorch
how-to-optimize-gemm
taco
The Tensor Algebra Compiler (taco) computes tensor expressions on sparse and dense tensors
limin2021's Repositories
limin2021/how-to-optimize-gemm
limin2021/baidu-allreduce
limin2021/convnet-benchmarks
Easy benchmarking of all publicly accessible implementations of convnets
limin2021/CUDA
GPU-accelerated LIBSVM is a modification of the original LIBSVM that exploits the CUDA framework to significantly reduce processing time while producing identical results. The functionality and interface of LIBSVM remains the same. The modifications were done in the kernel computation, that is now performed using the GPU.
limin2021/eakmeans
Implementation of fast exact k-means algorithms
limin2021/faiss
A library for efficient similarity search and clustering of dense vectors.
limin2021/fastText
Library for fast text representation and classification.
limin2021/gensim
Topic Modelling for Humans
limin2021/gunrock
High-Performance Graph Primitives on GPUs
limin2021/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
limin2021/kNN-CUDA
Fast k nearest neighbor search using GPU
limin2021/lectures
Oxford Deep NLP 2017 course
limin2021/libsvm
limin2021/lightgbm-gpu
Development Repository for GPU-accelerated GBDT training
limin2021/mkldnn-perf
Testing the performance of the MKL-DNN
limin2021/MobileNet-Caffe
Caffe Implementation of Google's MobileNets
limin2021/nccl
Optimized primitives for collective multi-GPU communication
limin2021/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
limin2021/neural_session_relevance_model
Sequence to sequence learning for generative context-aware query suggestion.
limin2021/NRE
Neural Relation Extraction, including CNN, PCNN, CNN+ATT, PCNN+ATT
limin2021/ompi
Open MPI main development repository
limin2021/pWord2Vec
Parallelizing word2vec in shared and distributed memory
limin2021/rnn
General Stride K-Nearest Neighbors
limin2021/sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
limin2021/tensorflow-beginner
tensorflow learning according to CS20SI
limin2021/thrust
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL).
limin2021/tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
limin2021/tprint
tprint is a printing library specially designed for SW architecture. Currently providing C and fortran API.
limin2021/tvm
End to end Tensor IR/DSL stack for deploying deep learning workloads to hardwares
limin2021/Wikipedia_Word2vec
Train Word2vec Model based on Wikipedia