Issues
- 2
- 1
Malformed HTTP headers lead to "ValueError: need more than 1 value to unpack" crash
#19 opened by JustAnotherArchivist - 0
http.client.BadStatusLine: http/1.1 200 OK
#24 opened by chris-aeviator - 1
pass on warc.gz error
#20 opened by marked - 2
Add easy way to iterate over warc records
#14 opened by sirex - 0
wpull WARCs cause "Content block length changed from X to Y" warnings on warcinfo record
#18 opened by JustAnotherArchivist - 1
- 2
Handling for "files" that are purely in memory?
#16 opened by spott - 0
Support payload digest of revisit records
#15 opened by Arkiver2 - 0
URL agnostic deduplication of WARC
#13 opened by Arkiver2 - 3
Reading in an in-memory gzip.GzipFile object breaks warcat.model.binary.BinaryFileRef objects
#10 opened by d-m - 0
A name to a file object is not handled correctly
#11 opened by chfoo - 1
- 1
- 0
Feature: extract only files matching a regexp
#8 opened by gwern - 2
Support older Python 2.7
#2 opened by chfoo - 1
Handle long filenames
#5 opened by chfoo - 1
http.client.IncompleteRead crash during extract
#6 opened by chfoo - 1
- 1
- 0