Issues
- 1
Drop support for Python2
#7 opened by sebastian-nagel - 0
- 0
- 1
Drop support for Python 2.7
#40 opened by sebastian-nagel - 0
Use simdjson to read WAT payloads
#41 opened by sebastian-nagel - 5
boto3 credentials error when running CCSparkJob with ~100 S3 warc paths as input, but works with <10 S3 warc paths as input
#32 opened by praveenr019 - 1
Looks like ccspark tried to access everything from local file. What's wrong with the settings?
#39 opened by GenuineReader - 1
- 4
Incompatible Architecture
#34 opened by swetepete - 0
- 3
Bad Substitution
#33 opened by swetepete - 5
Use SparkSession instead of SQLContext
#24 opened by sebastian-nagel - 2
- 3
- 6
- 1
Broken links in README
#23 opened by gamtiq - 2
- 0
Test and update examples to work with ARC files of the 2008 - 2012 crawls
#20 opened by sebastian-nagel - 2
Processing English only archives
#17 opened by jaehunro - 0
- 0
- 7
Commands to execute python files?
#12 opened by calee88 - 0
- 0
- 2