[Reminder] 🔔
- Help the creator channel reach 20k subscribers. He will keep uploading quality content for you: Amin M. Boulouma Channel
- This tutorial is best understood following the video playlist: Data Engineering with Python
Hosted by Amin M. Boulouma, contact and questions: amine.boulouma.com
- Spark with Python tutorial made simple: https://youtu.be/vQqisFjAnsE
- Source code with documentation: https://amboulouma.com/spark-workshop
- Github: https://github.com/amboulouma/spark-workshop
pip install pyspark
import random
from pyspark import SparkContext
sc = SparkContext()
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
num_samples = 1000000000
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
Ref: https://www.sicara.ai/blog/2017-05-02-get-started-pyspark-jupyter-notebook-3-minutes