SparkSQL-Using-Pyspark

Spark SQL functions and operations using Pyspark and Spark Submit

A Basic Movie CSV Dataset of about 5000 records is read into a SparkSQL dataframe and manipulated using different Dataframe operations. I have tried to :

Adding a computed column to a datframe.
Grouping Operations on a dataframe
reading into a SparkSQl dataframe from a CSV and JSON file
using a external package during runtime using spark submit
UDF in pyspark
join operations
other operations like case-when,union all , orderBY , Array column manipulation,Windowing operations , Hive Context and so on..
writing a dataframe to a file

Different types of reports are retrieved using dataframe operations and results are exported as output files. And the sample output will also be shown along wit the code just like in spark shell.

yaopu/SparkSQL-Using-Pyspark

SparkSQL-Using-Pyspark