anilknayak/BigData

Jupyter Notebook

Learn PySpark and ML [Big Data and Data Analysis]

Distributed computing and Machine Learning

Folder	Description
spark_understanding	Spark API usage for distributed computing [Incremental Contribution]
spark_project_regression	Regression using MLib of PySpark [Incremental Addition of ML]

Follow this page to grab some future tutorials on Data Analysis, next tutorial would be

Decision Tree Regressor
Classification https://spark.apache.org/docs/latest/ml-classification-regression.html#classification
Clustering [K-means] https://spark.apache.org/docs/latest/ml-clustering.html
Random Forest https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier
Principal component analysis (PCA)
Singular value decomposition (SVD)
Frequent Pattern Mining