Pyspark-Installation-and-integration-with-Python-in-Jupyter-Notebook

Pre-requisite

Java

https://www.oracle.com/java/technologies/javase/javase8u211-later-archive-downloads.html

  • how to check java version ==enter command in cmd or anaconda prompt==command== java -version

Spark Download

Hadoop winutils

  • Download hadoop winutils from this below link: https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.1/bin

  • Environmnet Variables Update

  • Variable name= SPARK_HOME | Variable Value = C:\spark\spark-2.2.1-bin-hadoop2.7

  • Variable name=HADOOP_HOME | Variable Value = C:\Hadoop

  • For JAVA===> Variable name = JAVA_HOME | Variable Value= C:\Program Files\Java\jdk1.8.0_311

Pyspark Installation in Anaconda

  • ! pip install pyspark (Write this command in Jupyter Notebook or in anaconda prompt)

findspark Installation

  • ! pip install findspark

Test