Cannot extract files from a .cab file containing file names encoded in shift_JIS

Question

Cannot extract files from a .cab file containing file names encoded in shift_JIS

Opened this issue 7 months ago · 1 comments

If I try to extract files from a .cab file containing file names encoded in shift_JIS, it aborts with the following error:

Extracting cabinet: DENKEN CG集（同人）.cab
  extracting DENKEN?@CG?W?i???l?j/DENKEN?@CG?W.d88
DENKEN?@CG?W?i???l?j/DENKEN?@CG?W.d88: can't create file path

Using -f to try to workaround it also doesn't work.

The sample .cab file can be downloaded here.

Plenty of other Japanese sample .cab files can be obtained here.

Answer 1 · 2024-02-18T12:18:54.000Z

Thanks for the source of Japanese cabinet files!

Unfortunately, I can't reproduce your problem.

Can you describe the system you're running cabextract on, and if you know, what type of filesystem you're writing to?

The error message "can't create file path" is caused by cabextract trying to create a directory called (per your output) DENKEN?@CG?W?i???l?j, which your system rejects.

For comparison, it succeeds on Ubuntu / ext4, and Cygwin / NTFS gives exactly the same output:

$ cabextract DENKENБ@CGПWБiУпРlБj.cab 
Extracting cabinet: DENKENБ@CGПWБiУпРlБj.cab
  extracting DENKEN�@CG�W�i���l�j/DENKEN�@CG�W.d88

All done, no errors.
$ find DENKEN* -ls
 35001622      4 drwxrwxr-x   2 kyz      kyz          4096 Feb 18 11:17 DENKEN\201@CG\217W\201i\223\257\220l\201j
 35001623    408 -rw-rw-r--   1 kyz      kyz        415824 Dec 19  1997 DENKEN\201@CG\217W\201i\223\257\220l\201j/DENKEN\201@CG\217W.d88
 34083492    136 -rw-rw-r--   1 kyz      kyz        138062 Feb 18 11:17 DENKEN\320\221@CG\320\237W\320\221i\320\243\320\277\320\240l\320\221j.cab

You should also consider using the -e encoding option so cabextract translates the filenames to UTF-8. If the filesystem you're using is OK with UTF-8 filenames, you'll get better results.

Again for comparison, with the -e shift_jis option; Ubuntu is using glibc's iconv(), and Cygwin is using libiconv. They give identical output:

$ cabextract -e shift_jis DENKENБ@CGПWБiУпРlБj.cab 
Extracting cabinet: DENKENБ@CGПWБiУпРlБj.cab
  extracting DENKEN CG集（同人）¥DENKEN CG集.d88

All done, no errors.
$ find DENKEN* -ls
 34083492    136 -rw-rw-r--   1 kyz      kyz        138062 Feb 18 11:21 DENKEN\320\221@CG\320\237W\320\221i\320\243\320\277\320\240l\320\221j.cab
 34083493    408 -rw-rw-r--   1 kyz      kyz        415824 Dec 19  1997 DENKEN\343\200\200CG\351\233\206\357\274\210\345\220\214\344\272\272\357\274\211\302\245DENKEN\343\200\200CG\351\233\206.d88

As a side note, this does raise a separate concern with me; the encoding conversion also translated the file separators from \ to ¥, so the result has no directory parts. It's a known issue with the character set, on Japanese computers actually using Shift_JIS, ¥ is a valid separator (because it's just how the font displays character code 0x5C, which is still the file separator character. Here it's translated it to the UTF-8 encoding of U+00A5 ¥ YEN SIGN so there are no separators. I'll have to think about what to do about this (if anything).