TIF files return an End-Of-Stream error
sandercoffee opened this issue · 7 comments
Hello, I'm new here, I'm not sure how to make a pull request correctly, so I'll give you some details I found:
1. Some .TIF files return an End-Of-Stream error, which breaks validation. (BUG)
In this step, the idea is to return some specific format according to the Tiff Tags, or the default "image/tiff", but as the error happens and is not handled it breaks the validation.
Lines 1486 to 1490 in feac593
So I used a try/catch here and worked perfectly..
const tif = {
ext: 'tif',
mime: 'image/tiff',
};
try {
const fileType = await this.readTiffIFD(false);
return fileType ? fileType : tif;
} catch (_) {
return tif;
}
Example file that returns error https://drive.google.com/file/d/1UDiCM3jmi0-VJzLD0B7Zn9yoKFL5V3Ur/view?usp=sharing
2. Add support for .MSI (Microsoft Software Installer) currently detected as application/x-cfb (enhancement) (help wanted)
Lines 1206 to 1218 in feac593
Change the code to the following:
// Increase sample size from 12 to 256.
await tokenizer.peekBuffer(this.buffer, { length: Math.min(256, tokenizer.fileInfo.size), mayBeLess: true });
if (this.check([0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x3e, 0x00])) {
// Detected Microsoft Software Installer File.
return {
ext: "msi",
mime: "application/x-msi",
};
}
if (this.check([0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1])) {
// Detected Microsoft Compound File Binary File (MS-CFB) Format.
return {
ext: "cfb",
mime: "application/x-cfb",
};
}
// -- 15-byte signatures --
If you can comment and collaborate with ideas I'd appreciate it :D
re: cfb checking - i'm porting this to lua, and i added support for doc/ppt/xls as well:
(the original code is from this SO post)
if check("\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1") then
local sector_size = bit.lshift(1, get_u16_le(pos + 30))
local root_dir_index = get_u32_le(pos + 48)
pos = (root_dir_index + 1) * sector_size + 81
-- microsoft CLSIDs below
-- versions:
-- 5 (95)
-- 6 (6.0-7.0)
-- 8 (97-2003)
-- 12 (2007?)
-- https://raw.githubusercontent.com/decalage2/oletools/master/oletools/common/clsid.py
if check("\x9b\x4c\x75\xf4\xf5\x64\x40\x4b\x8a\xf4\x67\x97\x32\xac\x06\x07") then
-- Word.Document.12: f4754c9b-64f5-4b40-8af4-679732ac0607
return "doc", "application/msword"
elseif check("\x06\x09\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- Word.Document.8: 00020906-0000-0000-c000-000000000046
return "doc", "application/msword"
elseif check("\x00\x09\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- Word.Document.6: 00020900-0000-0000-c000-000000000046
return "doc", "application/msword"
elseif check("\x30\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- Excel.Sheet.12: 00020830-0000-0000-c000-000000000046
return "xls", "application/vnd.ms-excel"
elseif check("\x20\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- Excel.Sheet.8: 00020820-0000-0000-c000-000000000046
return "xls", "application/vnd.ms-excel"
elseif check("\x10\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- Excel.Sheet.5: 00020810-0000-0000-c000-000000000046
return "xls", "application/vnd.ms-excel"
elseif check("\xf4\x55\x4f\xcf\x87\x8f\x47\x4d\x80\xbb\x58\x08\x16\x4b\xb3\xf8") then
-- Powerpoint.Show.12: cf4f55f4-8f87-4d47-80bb-5808164bb3f8
return "ppt", "application/vnd.ms-powerpoint"
elseif check("\x10\x8d\x81\x64\x9b\x4f\xcf\x11\x86\xea\0\xaa\0\xb9\x29\xe8") then
-- Powerpoint.Show.8: 64818d10-4f9b-11cf-86ea-00aa00b929e8
return "ppt", "application/vnd.ms-powerpoint"
elseif check("\x84\x10\x0c\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- msi: 000c1084-0000-0000-c000-000000000046
return "msi", "application/octet-stream"
else
return "cfb", "application/x-cfb"
end
end
as for tif, oddly enough that image seems to work fine in the port - however fixture.tif
(correctly) errors as it is severely truncated.
on a related note, it might be a good idea to reconsider what happens when a file almost has the correct structure, but is invalid (an example being the invalid png fixture) - since it's probably still useful information that it would be a png if only it weren't invalid
Thanks for your detailed feedback @sandercoffee.
Please don't mix issues. Harder to administer the status if we only work or resolve one if the sections.
You can read GitHub guidance how to create a Pull-Request: Creating a pull request
I don't find it super clear, maybe the small summary helps:
- Fork this repository via Github interface
- Clone the forked repository locally
- Create a new branch locally (you computer). Try to name the branch such a way it understandable what the change is about, not critical.
- Commit you changes locally
- Push the branch (will be pushed to you forked repository).
- Turn the branch into a pull request (PR):
- by going to this repository, you will see you branch probably on the first page, with the possibility to turn it into a PR
- Describe the change
- If it resolves an issue, use something like
Resolves: #560
in the description - Your PR will be reviewed, unless you change it to Draft, which indicates you have not finalized
- Keep adding commits if you want to add changes
The following conventions how to name remote repositories:
- upstream (this repository, the target repository you want to contribute to)
- origin (the fork you created of this repository)
- local (the local clone you have on your workstation)
Image source: Confusing Terms in the Git Terminology
See also: https://levelup.gitconnected.com/how-to-sync-forked-repositories-using-git-or-github-2933e497fa16
Just give it a try, that's how we all started.
re: resolving one of the sections... technically you could make it a checklist, and progress would update correctly... but yeah it's still a lot better to split it into multiple issues, especially if they're not very related...
if anyone's interested, here's an updated list of cfb clsids. (if it's marked as non-standard, that means i'm just guessing what the mimetype would be) (also note that the autodesk mimetype is the one autodesk uses, rather than the one registered with IANA)
if
check("\x9b\x4c\x75\xf4\xf5\x64\x40\x4b\x8a\xf4\x67\x97\x32\xac\x06\x07") or -- Word.Document.12: clsid f4754c9b-64f5-4b40-8af4-679732ac0607
check("\x06\x09\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") or -- Word.Document.8: clsid 00020906-0000-0000-c000-000000000046
check("\x00\x09\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") -- Word.Document.6: clsid 00020900-0000-0000-c000-000000000046
then
return "doc", "application/msword"
elseif
check("\x30\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") or -- Excel.Sheet.12: clsid 00020830-0000-0000-c000-000000000046
check("\x20\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") or -- Excel.Sheet.8: clsid 00020820-0000-0000-c000-000000000046
check("\x10\x08\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") -- Excel.Sheet.5: clsid 00020810-0000-0000-c000-000000000046
then
return "xls", "application/vnd.ms-excel"
elseif
check("\xf4\x55\x4f\xcf\x87\x8f\x47\x4d\x80\xbb\x58\x08\x16\x4b\xb3\xf8") or -- Powerpoint.Show.12: clsid cf4f55f4-8f87-4d47-80bb-5808164bb3f8
check("\x10\x8d\x81\x64\x9b\x4f\xcf\x11\x86\xea\0\xaa\0\xb9\x29\xe8") -- Powerpoint.Show.8: clsid 64818d10-4f9b-11cf-86ea-00aa00b929e8
then
return "ppt", "application/vnd.ms-powerpoint"
elseif check("\x46\xf0\x06\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- TemplateMessage: clsid 0006f046-0000-0000-c000-000000000046
return "oft", "application/vnd.ms-outlook"
elseif check("\x0b\x0d\x02\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- MailMessage: clsid 00020d0b-0000-0000-c000-000000000046
return "msg", "application/vnd.ms-outlook"
elseif check("\x84\x10\x0c\0\0\0\0\0\xc0\0\0\0\0\0\0\x46") then
-- msi: clsid 000c1084-0000-0000-c000-000000000046
return "msi", "application/octet-stream"
elseif -- autodesk inventor: https://knowledge.autodesk.com/search-result/caas/simplecontent/content/documentsubtype-list-common-name-inventors-name-cslid-inv-pro-2021-dev-tools.html
-- fixtures: https://knowledge.autodesk.com/support/inventor/troubleshooting/caas/downloads/content/inventor-sample-files.html
check("\x90\xb4\x29\x4d\xb2\x49\xd0\x11\x93\xc3\x7e\x07\x06\x00\x00\x00") or -- Part: clsid 4d29b490-49b2-11d0-93c3-7e07060000
check("\x03\x42\x46\x9c\xae\x9b\xd3\x11\x8b\xad\x00\x60\xb0\xce\x6b\xb4") or -- Sheet Metal Part: clsid 9c464203-9bae-11d3-8bad-0060b0ce6bb4
check("\x19\x54\x05\x92\xfa\xb3\xd3\x11\xa4\x79\x00\xc0\x4f\x6b\x95\x31") or -- Generic Proxy Part: clsid 92055419-b3fa-11d3-a479-00c04f6b9531
check("\x04\x42\x46\x9c\xae\x9b\xd3\x11\x8b\xad\x00\x60\xb0\xce\x6b\xb4") or -- Compatibility Proxy Part: clsid 9c464204-9bae-11d3-8bad-0060b0ce6bb4
check("\xaf\xd3\x88\x9c\xeb\xc3\xd3\x11\xb7\x9e\x00\x60\xb0\xf1\x59\xef") or -- Catalog Proxy Part: clsid 9c88d3af-c3eb-11d3-b79e-0060b0f159ef
check("\xd4\x80\x8d\x4d\xb0\xf5\x60\x44\x8c\xea\x4c\xd2\x22\x68\x44\x69") -- Molded Part Document: clsid 4d8d80d4-f5b0-4460-8cea-4cd222684469
then
return "ipt", "application/vnd.autodesk.inventor" -- non-standard
elseif
check("\xe1\x81\x0f\xe6\xb3\x49\xd0\x11\x93\xc3\x7e\x07\x06\x00\x00\x00") or -- Assembly: clsid e60f81e1-49b3-11d0-93c3-7e0706000000
check("\x54\x83\xec\x28\x24\x90\x0f\x44\xa8\xa2\x0e\x0e\x55\xd6\x35\xb0") -- Weldment: clsid 28ec8354-9024-440f-a8a2-0e0e55d635b0
then
return "iam", "application/vnd.autodesk.inventor.assembly"
elseif
check("\x80\x3a\x28\x76\xdd\x50\xd3\x11\xa7\xe3\x00\xc0\x4f\x79\xd7\xbc") or -- Presentation: clsid 76283a80-50dd-11d3-a7e3-00c04f79d7bc
check("\x7d\xc1\xb4\xa2\xd2\xf0\x0f\x4c\x97\x99\xdd\x5f\x71\xdf\xb2\x91") -- Composite Presentation: clsid a2b4c17d-f0d2-4c0f-9799-dd5f71dfb291
then
return "ipn", "application/vnd.autodesk.inventor.presentation" -- non-standard
elseif check("\xf1\xfd\xf9\xbb\xdc\x52\xd0\x11\x8c\x04\x08\x00\x09\x0b\xe8\xec") then
-- Drawing: clsid bbf9fdf1-52dc-11d0-8c04-0800090be8ec
return "idw", "application/vnd.autodesk.inventor.drawing" -- non-standard
elseif check("\x5d\x5c\xb9\x81\x31\x8e\x65\x4f\x97\x90\xcc\xf6\xec\xab\xd1\x41") then
-- Design View: clsid 81b95c5d-8e31-4f65-9790-ccf6ecabd141
return "idv", "application/vnd.autodesk.inventor.designview" -- non-standard
elseif check("\x30\xb0\xfb\x62\xc7\x24\xd3\x11\xb7\x8d\x00\x60\xb0\xf1\x59\xef") then
-- iFeature: clsid 62fbb030-24c7-11d3-b78d-0060b0f159ef
return "ide", "application/vnd.autodesk.inventor.ifeature" -- non-standard
else
return "cfb", "application/x-cfb"
end
Please open a different issue for the MSI requirements.
Just had the same problem with Tiff files.
Fixed by upgrading library to the newest versio.
I just wanted to thank you guys for the great effort 💪 🚀