To find out which of the programming languages and execution engines take the maximum and the minimum amount of time to process files.
This 📗 project conducts data analysis 📊 & comparisons of the execution times ⌚ taken for computing the word count of input text files varying from extremely small to extremely large sizes in various programming languages and execution engines. This project includes sample findings, observations, comparisons and sample word count programs. We then calculate the time taken to process the files individually and gather the results. All of the findings from individual analyses were collected and combined in a google colab notebook where we have plotted graphs using matplotlib and drawn conclusions based on our findings.
File Name | Size |
---|---|
apache-hadoop-wiki.txt | 46.5 kB |
big.txt | 6.5 MB |
Computing for individual languages. Click the images to go to the respective data analysis results.
Computing for individual execution engines. Click the images to go to the respective data analysis results.
We have observed from the graphs that Python has the least execution time for small and large files while Scala has the largest execution time.
We have observed that Spark has the least execution time while Hadoop has the highest execution time.
The Google Colab Notebook with the complete Analysis with Graphs: Notebook