Liu-xiandong

HPC && ML system

Beijing

Pinned Repositories

intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Language:Python2.1k 28 166209
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 1 00
FastAPSP
The Fast APSP algorithm is used to solve the All-Pairs Shortest Paths (APSP) problem. The algorithm uses the divide and conquers strategy. First, divide the graph structure by METIS, and divide the input graph G into multiple subgraphs. Then the solution of the APSP problem is solved by computing the subgraph. The Fast APSP algorithm combines the SSSP algorithm and the Floyd-Warshall algorithm. Compared with the Part APSP algorithm, it eliminates the data dependence and communication between sub-graphs. The Fast APSP algorithm has achieved good performance in graphs with good properties. We tested a lot of sparse graph data in the Suite sparse matrix collection and network repository, and the Fast APSP algorithm showed better performance than other APSP algorithms.
Language:C++2 2 00
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda808 13 15127
OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Language:C0 1 00
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Language:C++0 1 01
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++0 1 00
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python0 1 00
deepxde
A library for scientific machine learning and physics-informed learning
Language:Python2.6k 57 772741
Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Language:C++22.1k 717 18.3k5.6k

Liu-xiandong's Repositories

Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda808 13 15127
Liu-xiandong/FastAPSP
The Fast APSP algorithm is used to solve the All-Pairs Shortest Paths (APSP) problem. The algorithm uses the divide and conquers strategy. First, divide the graph structure by METIS, and divide the input graph G into multiple subgraphs. Then the solution of the APSP problem is solved by computing the subgraph. The Fast APSP algorithm combines the SSSP algorithm and the Floyd-Warshall algorithm. Compared with the Part APSP algorithm, it eliminates the data dependence and communication between sub-graphs. The Fast APSP algorithm has achieved good performance in graphs with good properties. We tested a lot of sparse graph data in the Suite sparse matrix collection and network repository, and the Fast APSP algorithm showed better performance than other APSP algorithms.
Language:C++2 2 00
Liu-xiandong/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 1 00
Liu-xiandong/OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Language:C0 1 00
Liu-xiandong/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Language:C++0 1 01
Liu-xiandong/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++0 1 00
Liu-xiandong/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python0 1 00

Liu-xiandong

Pinned Repositories

intel-extension-for-transformers

cutlass

FastAPSP

How_to_optimize_in_GPU

OpenBLAS

Paddle

pytorch

tvm

deepxde

Paddle

Liu-xiandong's Repositories

Liu-xiandong/How_to_optimize_in_GPU

Liu-xiandong/FastAPSP

Liu-xiandong/cutlass

Liu-xiandong/OpenBLAS

Liu-xiandong/Paddle

Liu-xiandong/pytorch

Liu-xiandong/tvm