tech_geek

cloud-big data: implementation of pyspark,hive and pandas queeries on different use cases on vehicle's dataset.

hive.html : execution of queries were done on terminal of google cloud platform's terminal and stored the terminal page.
spark_demo.ipynb : queries implemented in pySpark with cluster of 5 nodes including master node.
Pandas_demo.ipynb : compare execution time of different queries with spark, hive and pandas.

dataset : We have created the vehicle data with the necessary parameter into consideration such as velocity, position of the sensors, acceleration, timestamps, variance etc. We have mocked the data to achieve a considerable bigger size dataset that is finally resulted with 24GB.

pythonmjs/selfdriving_car_pyspark_hive_queries

tech_geek