/Big-Data-with-Pyspark-in-Python

Spark and Python for Big Data with PySpark. Distributed Machine Learning

Primary LanguageJupyter Notebook

Spark and Python for Big Data with PySpark

PySpark was set-up for this course using any one of the below mentioned methods -

  1. Ubuntu + Spark + Python on Virtual Box
  2. Amazon EC2 with Python and Spark
  3. Databricks Notebook System
  4. AWS EMR Notebook (Not Free)

Implemented Machine Learning Techniques using PySpark -

  1. Linear Regression
  2. Logistic Regression
  3. Tree Methods i. Decision Trees ii. Random Forests iii. Gradient Boosted Trees
  4. K-means Clustering
  5. Recommender Systems
  6. Natural Language Processing
  7. Spark Streaming via Twitter