/spark

apacheSpark Applicaitons

Primary LanguagePython

Data Engineering With Apache Spark

This reposiotry contains python for data science and analytics at scale. The repository is a work in progress and will be updated accordingly.

The spark repository currently contains: -- JDBC Connectors for creating dataframes from databases like PostgresSQL and MySQL -- JDBC Connectors for creating dataframes from Amazon Redshift -- Benchmarking applications for testing ADHOC analysis with TPCH and TPCDS data