101arrowz/fflate

The encoded data was not valid for encoding utf-8

denniske opened this issue · 1 comments

How to reproduce

https://aoe4world.com/dumps
File: Games - RM 1v1 - Season 3 - 44 MB

Grab the file url and put it into url variable.

const dump = await (await fetch(url)).arrayBuffer();
const compressed = new Uint8Array(dump);
const decompressed = decompressSync(compressed);
const origText = strFromU8(decompressed);

The problem

The following error occurs in Line 4 in strFromU8(decompressed):

TypeError: The encoded data was not valid for encoding utf-8
    at TextDecoder.decode (node:internal/encoding:448:14)
    at strFromU8 (/Users/dennis/Projects/poc_collector/node_modules/fflate/lib/node.cjs:1780:19)
    at HistoricalTask.<anonymous> (/Users/dennis/Projects/poc_collector/dist/collector/webpack:/src/task/historical.task.ts:79:39)
    at Generator.next (<anonymous>)
    at fulfilled (/Users/dennis/Projects/poc_collector/node_modules/tslib/tslib.js:166:62)
    at processTicksAndRejections (node:internal/process/task_queues:95:5) {
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

I can download the file on mac os and extract it by double click in the finder and then open in visual studio code without problems. Visual studio code shows UTF-8 in the status bar.

That file is 533MB decompressed; strings in JavaScript can be at most 512MB. You can try to solve this by converting to strings with streams and using a streaming JSON parser; let me know if you want more info on how to do that.