/BigData

Primary LanguageJupyter Notebook

Learn PySpark and ML [Big Data and Data Analysis]

Distributed computing and Machine Learning

Folder Description
spark_understanding Spark API usage for distributed computing [Incremental Contribution]
spark_project_regression Regression using MLib of PySpark [Incremental Addition of ML]

Follow this page to grab some future tutorials on Data Analysis, next tutorial would be

  1. Decision Tree Regressor
  2. Classification https://spark.apache.org/docs/latest/ml-classification-regression.html#classification
  3. Clustering [K-means] https://spark.apache.org/docs/latest/ml-clustering.html
  4. Random Forest https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier
  5. Principal component analysis (PCA)
  6. Singular value decomposition (SVD)
  7. Frequent Pattern Mining