/python_spark_bigd_ml

Spark examples, personal projects with Python, Spark Streaming, Machine Learning, Spark DataFrames

Primary LanguageJupyter Notebook

This repo is based on examples/projects tried from the Udemy course: Spark and Python for Big Data with PySpark

Objectives:

  • Use Python and Spark together to analyze Big Data
  • Learning to setup Spark on local (Linux), Amazon Web Services EC2, and Databricks
  • Use Spark Streaming to Analyze Tweets in Real Time.
  • Learn to apply Linear Regression, Logistic Regression, Decision Trees, Random Forests, K-Means Clustering, Collaborative Filtering, NLP
  • Work on Consulting Projects that mimic real world situations, such as:
    • Classify Customer Churn with Logisitic Regression
    • Use Spark with Random Forests for Classification
    • Learn how to use Spark's Gradient Boosted Trees
    • Use Spark's MLlib to create Powerful Machine Learning Models
    • Create a Spam filter using Spark and Natural Language Processing.