The purpose of this project is to investigate the performance gains from GPU acceleration of Apache Spark. A few applications, namely WordCount, KMeans-Clustering, and Floating-point Sorting are evaluated. In addition a number GPU-compatible functions for Spark's Resilient Distributed Datasets (RDD) have been implemented.
Paper submission details coming soon.