Data Engineering With Apache Spark
This reposiotry contains python for data science and analytics at scale. The repository is a work in progress and will be updated accordingly.
The spark repository currently contains: -- JDBC Connectors for creating dataframes from databases like PostgresSQL and MySQL -- JDBC Connectors for creating dataframes from Amazon Redshift -- Benchmarking applications for testing ADHOC analysis with TPCH and TPCDS data