/dphpc

Primary LanguageC++

Efficient SDDMM Algorithms on GPU, a dynamic approach

Introduction

The Sampled Dense-Dense Matrix Multiplication (SDDMM) represents a foundational operation crucial for numerous significant machine learning factor analysis algorithms. Among these algorithms are Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gama Poisson. In this repository, we present both our code and the comprehensive findings detailed in our final report. Our focus lies on the development of GPU-Dynamic, an efficient GPU-based implementation of the SDDMM kernel. Our solution boasts remarkable performance enhancements, surpassing current implementations found in Torch with notable speedups of up to 100x. Furthermore, our implementation delivers competitive outcomes when compared to DGL.

Dataset

At the bottom of this README is a representation of all matrices that we have used for evaluation. The matrices range different dimensions as well as different densities. All matrices originate from the SuiteSparse Matrix Collection and can be downloaded by executing the install_matrices.sh script.

How to run the code?

To run the code, you need to install LibTorch for C++ which cou can download from here. We recommend using PyTorch >= 2.1.0 and CUDA >= 12.1. In addition you should have gcc >= 10.2.0 and cmake >= 3.21 installed.

Make sure to update the run_cmake.sh file by updating the path to your libtorch library. You can finally compile and run the code by executing

./run_cmake.sh
./build/src/dphpc --K 32 --data_folder data/

Small matrices

Matrix Rows Cols Non-Zero Density Image
Fluid 656 656 18,964 4.4%
Oil 66 66 4,356 100%
Biochemical 1,922 1,922 4,335 0.1%
Circuit 1,220 1,220 5,860 0.39%
Heat 1,794 1,794 7,764 0.24%
Mass 420 420 7,860 4.45%
Adder 1,813 1,813 11,246 0.34%
Trackball 25,503 25,503 15,525 0.01%

Dense matrices

Matrix Rows Cols Non-Zero Density Image
Human Gene 2 14,340 14,340 18,068,388 8.8%
ND12k 36,000 36,000 14,220,946 1%
Mix 29,957 29,957 1,990,919 0.22%
Mecanics 29,067 29,067 2,081,063 0.24%
Power 8,140 8,140 2,012,833 3.03%
Combinatorics 4,562 5,761 2,462,970 9.37%
Stress 25,710 25,710 3,749,582 0.56%
Mouse 45,101 45,101 28,967,291 1.42%

Sparse Matrices

Matrix Rows Cols Non-Zero Density Image
Email enron 36,692 36,692 367,662 0.027%
Boeing 52,329 52,329 2,600,295 0.09%
Boeing Diagonal 217,918 217,918 11,524,432 0.02%
Stiffness 503,712 503,712 36,816,170 0.014%
Semi conductor 1,090,664 1,090,664 34,767,207 0.0029%
VLSI 1,453,908 1,453,908 37,475,646 0.0017%
Stack overflow 2,601,977 2,601,977 36,233,450 0.00053%
Chip 2,987,012 2,987,012 26,621,983 0.00029%