Issues
- 0
WAT: Duplicated payload metadata values for "Actual-Content-Length" and "Trailing-Slop-Length"
#43 opened by sebastian-nagel - 0
- 0
- 3
- 1
- 1
- 1
Clear up all dependabot alerts
#39 opened by wumpus - 1
WAT extractor: Document title bug
#36 opened by robertwaksmunski - 6
- 3
WAT extractor: Overlong truncated HTTP request header line throws exception and loss of request record
#32 opened by sebastian-nagel - 0
Non-ASCII/UTF-8 characters lost in WARC-Target-URI during WAT/WET extraction
#27 opened by sebastian-nagel - 1
WET files may include binary content if HTTP Content-Type header erroneously indicates HTML
#26 opened by sebastian-nagel - 4
Failed tests: testInterruptibility (org.archive.util.InterruptibleCharSequenceTest): exception not throw
#25 opened by cronopioelectronico - 0
- 1
- 13
- 2
- 1
WAT: unescape XML/HTML character entities
#14 opened by sebastian-nagel - 6
- 1
[WAT] Add rel attribute to A@/href links
#10 opened by sebastian-nagel - 1
[WET] Missing spaces in parsed content
#13 opened by pipldev - 1
- 1
[WAT extraction] Empty HTTP header fields are filled with value from preceding field
#11 opened by sebastian-nagel - 4
- 0
Complete HTML link extraction to cover all element attributes of type URI
#9 opened by sebastian-nagel - 1
- 2
- 0
- 1
- 0
- 1