/selfdriving_car_pyspark_hive_queries

cloud-big data: implementation of pyspark,hive and pandas queeries on different use cases on vehicle's dataset

Primary LanguageHTML

tech_geek

cloud-big data: implementation of pyspark,hive and pandas queeries on different use cases on vehicle's dataset.

  1. hive.html : execution of queries were done on terminal of google cloud platform's terminal and stored the terminal page.
  2. spark_demo.ipynb : queries implemented in pySpark with cluster of 5 nodes including master node.
  3. Pandas_demo.ipynb : compare execution time of different queries with spark, hive and pandas.

dataset : We have created the vehicle data with the necessary parameter into consideration such as velocity, position of the sensors, acceleration, timestamps, variance etc. We have mocked the data to achieve a considerable bigger size dataset that is finally resulted with 24GB.