/spark-mllib-medium

This repo shows how to review and derive information from datasets using Python. First, get an overview of data science and how it open source libraries like Python can be used for your data analysis need. Then, discover how to set up labs and data interpreters. Next, learn about how you can use pandas, NumPy, and SciPy for numerical processing, scientific programming, and extensive data exploration. With these options at your disposal, you'll be ready for the following code which focuses on making predictions using machine learning tools, data classifiers, and clusters. The repo concludes with a look at big data and how PySpark can be used for computing.

Primary LanguageJupyter Notebook

Watchers