Leverage gzip extra field "sl" to skip over compressed WARC records
sebastian-nagel opened this issue · 0 comments
sebastian-nagel commented
WARC writers may provide a gzip extra field "sl" (recommended by WARC 0.9 but dropped in newer versions) to encode the length of the compressed WARC record. This can be used to quickly skip over the current record for tasks (eg. CDX indexing) which do not require to read the payload. See also #14/#15.