pyspark-dataframes

There are 3 repositories under pyspark-dataframes topic.

  • sbl-sdsc/df-parallel

    Comparison of Dataframe libraries for parallel processing of large tabular files on CPU and GPU.

    Language:Jupyter Notebook6103
  • mhaseebtariq/pyspark-helpers

    Useful helper functions for PySpark dataframe operations

    Language:Jupyter Notebook0201
  • RJBarker/home_sales

    Use PySpark and SparkSQL to execute SQL queries through a temporary view of the DataFrame created. Conduct additional queries on cached and partitioned data to determine runtime comparisons.

    Language:Jupyter Notebook10