- Run:
pig -x local pagerank.pig
- Run:
pig -x local -embedded jython pagerank.py
- Run:
spark-shell
:load pagerank.scala
-
Install the html parser Beautiful Soup locally with:
python3 install.py --local
inside the beautifulSoup folder -
Run
python3 commoncrawl-to-pigformat.py
to generate the etl-file.txt