Pyspark Learning Journey
Online Tutorials 1.https://spark.apache.org/docs/latest/rdd-programming-guide.html 2.http://www.sparktutorials.net/Getting+Started+with+Apache+Spark+RDDs 3.https://www.codementor.io/jadianes/spark-python-rdd-basics-du107x2ra 4.http://files.cnblogs.com/files/sirkevin/Spark_for_Python_Developers.pdf 5.https://www.tutorialspoint.com/pyspark 6.https://www.dezyre.com/apache-spark-tutorial/pyspark-tutorial 7.http://www.kirupagaran.com/images/free_downloads/Apache_Spark_Programming_Cheat_Sheet.pdf 8. https://medium.com/makemytrip-engineering
Cloudera not opening solved:
https://amiduos.com/support/knowledge-base/article/enabling-virtualization-in-lenovo-systems
PySpark Configuration
- Open Pycharm
- File
- Settings
- Project
- Project Structure
- Add Content Root
- Add the python libraries - py*.zip,pyspark.zip
- Add Content Root
- Project Structure
- Project
- Settings
- File
csk@csk-ai-revolution:/sparkscala/spark-2.4.0-bin-hadoop2.6/bin$ export PYSPARK_PYTHON=/home/csk/anaconda/envs/face/bin/python
csk@csk-ai-revolution:/sparkscala/spark-2.4.0-bin-hadoop2.6/bin$ ./pyspark
It will open pyspark in command prompt