commoncrawl/cc-mrjob
Demonstration of using Python to process the Common Crawl dataset with the mrjob framework
PythonMIT
Issues
- 2
AWS EMR issues
#30 opened by DallanQ - 0
Can not run examples locally
#29 opened by brand17 - 1
subprocess failed with code 143
#27 opened by CryptoKR - 4
Error Launching job : Output directory s3://mapreducecommoncrawl/output1 already exists. Streaming Command Failed! Command exiting with ret '5'
#28 opened by PhuongDelrosario - 1
- 3
Cannot run mrjob on EMR
#17 opened by rubenmarias - 3
bootstrapping issues
#25 opened by andresriancho - 1
Python 3 compatibility
#11 opened by sebastian-nagel - 3
Unable to run examples on aws emr cluster
#23 opened by bruceadowns - 17
- 0
Update EMR conf
#10 opened by sebastian-nagel - 1
Upgrade to use boto3
#18 opened by sebastian-nagel - 0
request-canceled-and-instance-running
#14 opened by 2803media - 4
Can't fetch history log; missing job ID
#13 opened by 2803media - 0
ImportError: No module named mrcc
#12 opened by 2803media - 2
Job fails when running local job
#9 opened by mitcheccles - 3