crissyfield/repo-lookout

GitHub Archive dump updates error out with JSON parsing error

Closed this issue · 1 comments

tja commented

The latest GitHub archive dump updates fail with the following error:

invalid character '\\x00' looking for beginning of value

Investigation shows that the dump contains a block of null bytes (0x00). One theory is that the dump was "cleaned up" in a post-processing step, by "wiping out" an existing record.

tja commented

This has been fixed by introducing a wrapping io.Reader that simply converts all null bytes (0x00) to line-feeds (0x0a). Line-feeds are ignored by the JSON parser, i.e. the "wiped out" record is skipped.