h2non/filetype.py

xls and xlsx guessed as zip

mjturtora opened this issue · 5 comments

I thought this might be related to the open office fix just committed so I tried v0.1.3 and same issue. Originally tried v1.0.0

I got it to work on some image and video file types but confounds xls* with zip.

Oddly, it claims the file extension is zip as well. I get:

File extension: zip
File MIME type: application/zip

I haven't dug into this codebase enough to know how it tries to do it but it seems to be reading the magic numbers. There is an old Microsoft support response that may be useful but mainly because it says how hard this problem is:

Developing a tool to recognise MS Office file types ( .doc, .xls, .mdb, .ppt )

Got the same problem here.

It seems like .document types are not yet in pypi distibution.
Current version is 1.1.0, uploaded on 12 Jul - 47a0e25

PR with documents #133 got merged 5 Aug.

So, unless you are building from source, there is no support for .xls* in filetype yet.

Hi, sorry to mention @h2non @ferstar
Is there any estimation when the newest version will be up?

Thank you very much! @ferstar

h2non commented

This should be good. Closing.