Chinese encoding error - 中文编码错误
xmexg opened this issue · 9 comments
Can not use -c
to find files when there are Chinese characters in the zip
like: GBK in .zip -- UTF-8 in Computer(linux and windows)
How to reproduce:
mkdir 这是中文路径
cp one_file 这是中文路径
zip -r archive.zip 这是中文路径 -Ppassword
Better switch to a different encoded operating system
bkcrack -L archive.zip
let's try -c 这是中文路径/one_file
Hi, could you provide a minimal dummy zip file that reproduces this issue?
Are you sure your console encoding is set to UTF-8?
Anyways you can workaround the problem by using --cipher-index
option instead of -c
option to pass a numeric index instead of an entry name. The index is the number on the first column of bkcrack -L
output.
This is a ZIP issue, it encodes the filename with system locale encoding instead of UTF-8.
There are two solutions: allowing users to manually specify the encoding (unzip-iconv) or automatically guessing the encoding (unarchiver).
But I'm not sure if this is necessary for a zip cracking tool.
ebsite-update-2.29-simple.zip
password is MirlKoi
--cipher-index
is useful.
I deleted some files in zip, but now -L
is garbled in my win10.
How amazing!
When I deleted some files using bandzip, Chinese characters can be displayed normally in linux.
Thank you @xmexg for providing the file. I can confirm the ZIP file contains names in GBK encoding or similar. Unfortunately there is no additional metadata in this ZIP archive that could help decode the name right automatically.
Thank you @Aloxaf for your enlightening explanation. You are right, handling encoding correctly in this case would require user input or guessing.
Adding a solution into bkcrack to deal with such a case would be nice, but as there is a workaround with --cipher-index
, I don't think I'll attempt to implement it any time soon. Probably this deserve more documentation though.
Thank you for reporting the issue and for the feedback.
Do you have more comments about this issue? I will close it otherwise.
thank you
I suspect the line endings do not match.
The desktop.ini file contains windows-style line ending CR+LF (in hexadecimal 0d 0a).
$ xxd desktop.ini
00000000: 5b4c 6f63 616c 697a 6564 4669 6c65 4e61 [LocalizedFileNa
00000010: 6d65 735d 0d0a 3131 3534 3330 3039 385f mes]..115430098_
00000020: 7030 2e6a 7067 3d40 3131 3534 3330 3039 p0.jpg=@11543009
00000030: 385f 7030 2e6a 7067 2c30 0d0a 8_p0.jpg,0..
Maybe your known plaintext file try2.txt
uses LF line ending.
This would not work:
$ xxd try2.txt
00000000: 5b4c 6f63 616c 697a 6564 4669 6c65 4e61 [LocalizedFileNa
00000010: 6d65 735d 0a mes].
You need this:
$ echo -en "[LocalizedFileNames]\r\n" > try_crlf.txt
$ xxd try_crlf.txt
00000000: 5b4c 6f63 616c 697a 6564 4669 6c65 4e61 [LocalizedFileNa
00000010: 6d65 735d 0d0a mes]..
Ok, thank you.