Performance Predictions of Spark Jobs with Machine Learning Tasks Using Various Artificial Intellige

As data continues to grow faster than chips continue to shrink the need to divide the work in multiple CPUs rises. And as workloads and users continue to demand responsiveness and low latency, memory based distributed computing technologies are going to become even more mainstream. One such technology is Apache Spark and with the multiple possible parameters to play with, tuning it exactly to your needs requires a specialist. We aim to make an AI tool for those specialists that predicts the CPU and memory requirements of a Spark job depending on the initial parameters and also a few snapshots of the system after the execution has already begun. In this paper the focus is on machine learning and AI workloads. And yes we are using AI to predict for how long an AI will run. Our models can reach accuracy over 95% with precision of 1.0 second.