ProblemMeetedinAWS
How to use spark in EMR
When start EMR, ssh into it. I cannot use spark with python.
https://www.udemy.com/spark-and-python-for-big-data-with-pyspark/learn/v4/t/lecture/6804314?start=0
This is useful for someone new to aws spark.
Method:
- After ssh into cluster, "sudo pip install xxxx" to install the module that you need.
- Open "aws-xx-"(ssh url) + ":8890" to enter into zeppelin in the cluster.
- run your code in zeppelin use spark in py/scala to debug.