cloud-big data: implementation of pyspark,hive and pandas queeries on different use cases on vehicle's dataset.
- hive.html : execution of queries were done on terminal of google cloud platform's terminal and stored the terminal page.
- spark_demo.ipynb : queries implemented in pySpark with cluster of 5 nodes including master node.
- Pandas_demo.ipynb : compare execution time of different queries with spark, hive and pandas.
dataset : We have created the vehicle data with the necessary parameter into consideration such as velocity, position of the sensors, acceleration, timestamps, variance etc. We have mocked the data to achieve a considerable bigger size dataset that is finally resulted with 24GB.