limin2021

major in computer science:high performance computing and parallel computing.

ISCASBeijing

Pinned Repositories

accfft
A Massively Parallel FFT Library for CPU/GPU
Language:C++0 1 00
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Language:Python0 0 00
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python0 0 00
AutoTest
Language:C++0 0 00
awesome-machine-learning-cn
机器学习资源大全中文版，包括机器学习领域的框架、库以及软件
0 1 00
baidu-allreduce
Language:Cuda0 1 00
EZLippi.github.io
这是我的个人网站的源码，欢迎fork。
Language:CSS2 1 00
fastmoe
A fast MoE impl for PyTorch
Language:Python2 0 00
how-to-optimize-gemm
Language:C1 1 01
taco
The Tensor Algebra Compiler (taco) computes tensor expressions on sparse and dense tensors
Language:C++1 1 00

limin2021's Repositories

limin2021/how-to-optimize-gemm
Language:C1 1 01
limin2021/baidu-allreduce
Language:Cuda0 1 00
limin2021/convnet-benchmarks
Easy benchmarking of all publicly accessible implementations of convnets
Language:Python
limin2021/CUDA
GPU-accelerated LIBSVM is a modification of the original LIBSVM that exploits the CUDA framework to significantly reduce processing time while producing identical results. The functionality and interface of LIBSVM remains the same. The modifications were done in the kernel computation, that is now performed using the GPU.
Language:HTML
limin2021/eakmeans
Implementation of fast exact k-means algorithms
Language:C++
limin2021/faiss
A library for efficient similarity search and clustering of dense vectors.
Language:C++
limin2021/fastText
Library for fast text representation and classification.
Language:C++
limin2021/gensim
Topic Modelling for Humans
Language:Python
limin2021/gunrock
High-Performance Graph Primitives on GPUs
Language:Cuda
limin2021/kmcuda
Large scale K-means and K-nn implementation on NVIDIA GPU / CUDA
Language:Jupyter Notebook
limin2021/kNN-CUDA
Fast k nearest neighbor search using GPU
Language:Cuda
limin2021/lectures
Oxford Deep NLP 2017 course
limin2021/libsvm
Language:Java
limin2021/lightgbm-gpu
Development Repository for GPU-accelerated GBDT training
Language:C++
limin2021/mkldnn-perf
Testing the performance of the MKL-DNN
Language:C++
limin2021/MobileNet-Caffe
Caffe Implementation of Google's MobileNets
limin2021/nccl
Optimized primitives for collective multi-GPU communication
Language:Cuda
limin2021/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language:C++
limin2021/neural_session_relevance_model
Sequence to sequence learning for generative context-aware query suggestion.
Language:Python
limin2021/NRE
Neural Relation Extraction, including CNN, PCNN, CNN+ATT, PCNN+ATT
Language:C++
limin2021/ompi
Open MPI main development repository
Language:C
limin2021/pWord2Vec
Parallelizing word2vec in shared and distributed memory
Language:C++
limin2021/rnn
General Stride K-Nearest Neighbors
Language:C
limin2021/sse-popcount
SIMD (SSE) population count --- http://0x80.pl/articles/sse-popcount.html
Language:C++
limin2021/tensorflow-beginner
tensorflow learning according to CS20SI
Language:Python
limin2021/thrust
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL).
Language:C++
limin2021/tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
Language:C++
limin2021/tprint
tprint is a printing library specially designed for SW architecture. Currently providing C and fortran API.
Language:C
limin2021/tvm
End to end Tensor IR/DSL stack for deploying deep learning workloads to hardwares
Language:C++
limin2021/Wikipedia_Word2vec
Train Word2vec Model based on Wikipedia
Language:Python