/pyspark_bigdata

Getting started with PySpark for Big data analysis

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Big Data Analytics with PySpark

Spark is a “lightning-fast cluster computing” framework for Big Data that provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop.This course is for data science enthusiast learners who will use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc.At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.

Sublime's custom image

Sublime's custom image

What you will learn

You will learn the following topics in this course

  • Pyspark Installation
  • Introduction to Big Data analysis with Spark
  • Programming in PySpark RDD’s
  • PySpark SQL & Data Frames
  • Machine Learning with PySpark MLli

Omdena Course Link: https://omdena.com/course/big-data-analytics-with-pyspark/

Follow (aiwithqasim) & do STAR ⭐️ thre repository to get the Notebooks updates. Thanks you 😊