ayazhassan

King Fahd University of Petroleum and MineralsDhahran, Saudi Arabia

Pinned Repositories

BonusProject
0 2 00
code-samples
Source code examples from the Parallel Forall Blog
Language:HTML0 2 00
CPPE-Dataset
Code for our paper CPPE - 5 (Medical Personal Protective Equipment), a new challenging object detection dataset
Language:Python0 1 00
CPU-Free-model
https://dl.acm.org/doi/10.1145/3577193.3593713
Language:Cuda0 0 00
cs344
Introduction to Parallel Programming class code
Language:Cuda0 2 00
DatabaseProject
0 2 00
datasciencecoursera
0 2 00
datasharing
The Leek group guide to data sharing
0 1 00
gpuocelot
Automatically exported from code.google.com/p/gpuocelot
0 1 1020
RT-CUDA-GUI-Development
Recent development in Graphic Processing Units (GPUs) has opened a new challenge in harnessing their computing power as a new general-purpose computing paradigm with its CUDA parallel programming. However, porting applications to CUDA remains a challenge to average programmers. We have developed a restructuring software compiler (RT-CUDA) with best possible kernel optimizations to bridge the gap between high-level languages and the machine dependent CUDA environment. RT-CUDA is based upon a set of compiler optimizations. RT-CUDA takes a C-like program and convert it into an optimized CUDA kernel with user directives in a con.figuration .file for guiding the compiler. While the invocation of external libraries is not possible with OpenACC commercial compiler, RT-CUDA allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS. For this, RT-CUDA uses interfacing APIs, error handling interpretation, and user transparent programming. This enables efficient design of linear algebra solvers (LAS). Evaluation of RT-CUDA has been performed on Tesla K20c GPU with a variety of basic linear algebra operators (M+, MM, MV, VV, etc.) as well as the programming of solvers of systems of linear equations like Jacobi and Conjugate Gradient. We obtained significant speedup over other compilers like OpenACC and GPGPU compilers. RT-CUDA facilitates the design of efficient parallel software for developing parallel simulators (reservoir simulators, molecular dynamics, etc.) which are critical for Oil & Gas industry. We expect RT-CUDA to be needed by many industries dealing with science and engineering simulation on massively parallel computers like NVIDIA GPUs.
Language:C5 2 11

ayazhassan's Repositories

ayazhassan/RT-CUDA-GUI-Development
Recent development in Graphic Processing Units (GPUs) has opened a new challenge in harnessing their computing power as a new general-purpose computing paradigm with its CUDA parallel programming. However, porting applications to CUDA remains a challenge to average programmers. We have developed a restructuring software compiler (RT-CUDA) with best possible kernel optimizations to bridge the gap between high-level languages and the machine dependent CUDA environment. RT-CUDA is based upon a set of compiler optimizations. RT-CUDA takes a C-like program and convert it into an optimized CUDA kernel with user directives in a con.figuration .file for guiding the compiler. While the invocation of external libraries is not possible with OpenACC commercial compiler, RT-CUDA allows transparent invocation of the most optimized external math libraries like cuSparse and cuBLAS. For this, RT-CUDA uses interfacing APIs, error handling interpretation, and user transparent programming. This enables efficient design of linear algebra solvers (LAS). Evaluation of RT-CUDA has been performed on Tesla K20c GPU with a variety of basic linear algebra operators (M+, MM, MV, VV, etc.) as well as the programming of solvers of systems of linear equations like Jacobi and Conjugate Gradient. We obtained significant speedup over other compilers like OpenACC and GPGPU compilers. RT-CUDA facilitates the design of efficient parallel software for developing parallel simulators (reservoir simulators, molecular dynamics, etc.) which are critical for Oil & Gas industry. We expect RT-CUDA to be needed by many industries dealing with science and engineering simulation on massively parallel computers like NVIDIA GPUs.
Language:C5 2 11
ayazhassan/BonusProject
0 2 00
ayazhassan/code-samples
Source code examples from the Parallel Forall Blog
Language:HTML0 2 00
ayazhassan/CPPE-Dataset
Code for our paper CPPE - 5 (Medical Personal Protective Equipment), a new challenging object detection dataset
Language:Python0 1 00
ayazhassan/CPU-Free-model
https://dl.acm.org/doi/10.1145/3577193.3593713
Language:Cuda0 0 00
ayazhassan/DatabaseProject
0 2 00
ayazhassan/datasciencecoursera
0 2 00
ayazhassan/datasharing
The Leek group guide to data sharing
0 1 00
ayazhassan/human-eval
Code for the paper "Evaluating Large Language Models Trained on Code"
Language:Python0 0
ayazhassan/individual-work
2 0
ayazhassan/Integrate_Python_code_with_Simulink
Language:MATLAB1 0
ayazhassan/jetbot
An educational AI robot based on NVIDIA Jetson Nano.
Language:Jupyter Notebook2 0
ayazhassan/jetbot_ros
ROS nodes and Gazebo model for NVIDIA JetBot with Jetson Nano
Language:C++2 0
ayazhassan/kdd-2018
Language:Jupyter Notebook1 0
ayazhassan/module3
GitHub Campus Advisor Module 3
2 0
ayazhassan/morphologica
A library of supporting code for numerical modelling (JSON config, HDF5 data, Modern OpenGL visualization)
Language:C++1 0
ayazhassan/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda0 0
ayazhassan/NeMo
NeMo: a toolkit for conversational AI
Language:Jupyter Notebook1 0
ayazhassan/padding_free_matrix_transpose_gpu
The advances of Graphic Processing Units (GPU) technology and the introduction of CUDA programming model facilitates developing new solutions for sparse and dense linear algebra solvers. Matrix Transpose is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on GPU devices. The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T x T) of the problem space. However, to the best of our knowledge an extra space of Tx(T+1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.
Language:Cuda2 02
ayazhassan/ParallelProgrammingwithOpenMP
Training Material
Language:C3 0
ayazhassan/ParEval
Language:C++0 0
ayazhassan/programming_examples-helloworld-RPC
Language:C2 0
ayazhassan/RNA-Prediction-using-Parallel-LR-Parsing-Algorithm
Language:C++2 0
ayazhassan/RStudio
A repository that will be linked with RStudio
Language:Rebol
ayazhassan/rticonnextdds-connector-py
RTI Connector for Connext DDS is a lightweight technology that enables DDS data to be accessed with Python.
Language:Python2 0
ayazhassan/Strassen-Matrix-Multiplication---Parallel-Implementations
Parallel Implementations of Strassen Matrix Multiplication and it's variant Winograd using different parallel programming platforms.
Language:C
ayazhassan/swirl_courses
:mortar_board: A collection of interactive courses for the swirl R package.
Language:R1 0
ayazhassan/tflearn
Deep learning library featuring a higher-level API for TensorFlow.
Language:Python2 0
ayazhassan/TornadoVM
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
Language:Java0 0
ayazhassan/WithoutGit
2 0