/Big-Data-with-PySpark

Advance your data skills by mastering Apache Spark. Using the Spark Python API, PySpark, you will leverage parallel computation with large datasets, and get ready for high-performance machine learning. From cleaning data to creating features and implementing machine learning models, you'll execute end-to-end workflows with Spark. The track ends with building a recommendation engine using the popular MovieLens dataset and the Million Songs dataset.

Primary LanguageJupyter Notebook

Big Data with PySpark

In this repository I will start a series, based on Data Camp 'Big Data with PySpark' course. I will reproduce the content seen in the course using Python and drawing conclusions from the data.

author contributions welcome

Background in: Mathematics, Python, Machine Learning and Applied Math.

Links:

Aprimore as habilidades de dados dominando o Apache Spark. Usando a API Spark no Python, PySpark, você aproveitará a computação paralela com grandes conjuntos de dados e se preparará para o aprendizado de máquina de alto desempenho. Da limpeza de dados à criação de recursos e implementação de modelos de aprendizado de máquina, você executará fluxos de trabalho de ponta a ponta com o Spark. A faixa termina com a criação de um mecanismo de recomendação usando o popular conjunto de dados MovieLens e o conjunto de dados Million Songs.

Projetos: