Docx detected as Zip due to trash files
lucasgadams opened this issue · 1 comments
lucasgadams commented
A few specific files that are proper docx type are being detected as zip. I looked into it and the current code checks for a matching mime type identifier in the beginning of the buffer, checking the first document in the zipped file. However as recently pointed out in the magic library (here), it is possible and valid to have trash documents/bytes anywhere in the zipped file, including the first document. The fix as noted in that link is that you need to skip over these trash bytes. Could we get that fix ported to this library?
lucasgadams commented
This was specifically fixed in the linux file command last year in this commit