Incorrect decoding when parsing an archive with a file with Cyrillic characters

Question

Incorrect decoding when parsing an archive with a file with Cyrillic characters

grdvsng opened this issue 2 years ago · 2 comments

I'm use archive with only one file with 5000K content length.

main test:

import { createReadStream } from 'fs';
import * as path from 'path';
import * as unzipper from 'unzipper';

import { checkInvalideByte } from '../string-helper';

jest.setTimeout(1000 * 30);

const TEST_ARCHIVE_PATH = path.join(
  __dirname,
  './mock_data/archive-with-russian-chars.zip',
);

describe('archive-with-cyrillic -chars', () => {
  describe('unzipper lib test', () => {
    it('should incorrect decode when use readable stream', async () => {
      const stream = createReadStream(TEST_ARCHIVE_PATH);
      const ended = new Promise((resolve, reject) => {
        stream.on('end', resolve);
        stream.on('error', reject);

        const unzipStream = unzipper.ParseOne();

        unzipStream.on('data', (chunk) => {
          const line = chunk.toString();

          checkInvalideByte(line).catch(reject);
        });

        stream.pipe(unzipStream);
      });

      await expect(ended).rejects.toThrow();
    });
  });
});

string-helper.ts

export function checkInvalideByte(content: string): Promise<void> {
  return new Promise((resolve, reject) => {
    const splited = content.split('\n');

    for (let i = 0; i < splited.length; i += 1) {
      const line = splited[i];
      const column = line.indexOf('�');

      if (column >= 0) {
        const text = `Invalide byte decoded at line: ${i + 1} column: ${
          column + 1
        } ('${line.slice(column - 10, column + 1)}')`;

        reject(new Error(text));
      }
    }

    resolve();
  });
}

Answer 1 · 2022-11-08T03:11:44.000Z

Hey, @grdvsng did you solve it? I'm having the same problem.

Answer 2 · 2022-11-09T06:36:04.000Z

Hello! I made wrapped class around library and append my own decoder to stream. And I check end of bytes on data and if it incorrect I push its to temp store and try redecode with new data event. вт, 8 нояб. 2022 г., 07:11 Luis Bajana ***@***.***>:

…

Hey, @grdvsng <https://github.com/grdvsng> did you solve it? I'm having the same problem. — Reply to this email directly, view it on GitHub <#260 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKKC2FB5EB63YB2CA4A4JWDWHHAHZANCNFSM57HFPLBQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>