Incorrect compressed filesize is displayed whenever a file is over 4GB in size.
NekuSoul opened this issue ยท 14 comments
Description
So it seems that there's a bug in Windows that whenever you compress a file that's over 4GB in size the compressed filesize is being incorrectly reported. This is probably due to some integer overflow issue (2^32 bytes = 4GB).
Here are some examples:
(All tests are single video files where the expected compression should be near 0%)
Filesize is 5.4GB. Compressed filesize is incorrectly being reported as 1.4GB. (5.4GB - 4GB = 1.4GB)
Filesize is 3.8GB, so slightly under the limit. Everything is displayed correctly as it should.
Here's an interesting case. The original file size is minimally larger than 4GB*3, but the compressed file is probably just a bit smaller. So we end up with a file that says it's been compressed down to slightly below 4GB.
And another example without any issue, since we stayed below the 4GB limit.
Steps to Reproduce
- Get a file that's over 4GB in size, preferably badly compressible, like a video file. It's important that the compressed file should end up being >4GB.
- Compress it using this tool.
Expected behavior: Since this is a bug in Windows, maybe it should display a warning that the shown compressed filesize is inaccurate. Or one could calculate the correct compressed filesize like this:
Filesize rounded down to nearest 4GB + reported compressed file size. This calculation has do be done on each individual file.
This would fix the displayed compressed file size in the majority of issues, except when the file has been made more than 4GB smaller, which is very unlikely to ever occur.
Actual behavior: It shows that the file has been heavily compressed, even though that's not the case at all.
Version
2.4.1.0
I'm not sure why you say it's a bug in Windows since Windows reports compressed sizes just fine:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa364930(v=vs.85).aspx
DWORD higherBytes = 0;
DWORD lowerBytes = GetCompressedFileSize("your-compressed-file-here", &higherBytes);
DWORD64 realSize = DWORD64(lowerBytes) + (DWORD64(higherBytes) << 32u);
printf("Compressed size: %llu", realSize);
If I had to guess the software is just ignoring the higher-bytes parameter and reporting the low-bytes only.
Or maybe it's OS-specific? I'm on Windows 7 and everything reports correctly.
Or maybe it's OS-specific? I'm on Windows 7 and everything reports correctly.
Considering this is using a compression method exclusive to Windows 10 and even Windows Explorer reports the wrong filesize I'd assume so.
Considering this is using a compression method exclusive to Windows 10 and even Windows Explorer reports the wrong filesize I'd assume so.
Windows 7 has the "compact" tool. As far as I know everything from XP and later has it.
Windows 7 has the "compact" tool.
Correct, but Windows 10 introduced a set of new multi-threaded compression algorithms targeted at files that usually never change. Those new compression methods are called using the /exe parameter, which only exists since Windows 10.
@Rseding91 it's a Windows error since CompactGUI just rips the data straight from compact.exe's output, which matches the Size on Disk in Windows Explorer. I don't do any fancy maths or anything. It's annoying that Windows is inaccurate but I shall try to fix this so it's not misleading ASAP.
Thanks @NekuSoul I've added information to the main page warning people of the inaccuracy for now. I've just had a dabble to try to fix it and it seems a lot more messy than I thought it would be
Just to throw a spanner in the works - I can't replicate your issue on my PC.
Maybe there's something else going on here?
I'm running Windows 10 FCU 64 bit on an NFTS partition. What about you?
I've just had a dabble to try to fix it and it seems a lot more messy than I thought it would be.
Yeah, for fun I've also tried to fix it, and it's really awful to fix completely due to the way compact.exe spits out information.
Just to throw a spanner in the works - I can't replicate your issue on my PC.
In your output log it says that no file was compressed. Maybe compact.exe doesn't compact certain file types or when it sees that compressing an already compressed file is useless?
Edit: Doesn't seem to be done by file type, mine was also .mkv. And yes, I'm fully updated and using a NTFS partition.
I also reported the origin of this issue to Microsoft through the Win10 Feedback Hub, but since I didn't use an account I have no way to track the status of it.
So let's hope that this will be fixed in a future version of Win10.
I've now made a test version that compresses each file 1 by 1 - it's about an order of magnitude slower since each file has to call it's own compact.exe process, and multiple concurrent calls only gets you so far. I've mitigated this by adding a filter that prevents poorly compressible filetypes from being compressed. I need to get a larger list of such filetypes from somewhere though.
But there is another issue - Some files do correctly compress from e.g. 5GB to 2GB, breaking everything again. Right now what I'm doing is for each file >4GB, it does the following:
- Check the total free space on the parent drive
- Compress the file
- If the file is >4GB, then go to step 4
- Check the total free space on the parent drive and calculate the difference
- Check the Size On Disk of the file and compare against the Raw Size
- Compare Step 4 with Step 3 to check for a major discrepancy
- If a major discrepancy is found, do a modulo 2^32 calculation on the file size to get the actual compression. Check the new size against step 3 to see if it's within a 10% tolerance (This is to allow for background drive changes)
- Report the size obtained from step 6
The problem is this only helps when you're actually compressing the file for the first time. When you go back later and try to analyse the file again., there's no way of knowing whether a compression on a >4GB file is legitimate or not because you can't compare the drive size changes.
The problem is this only helps when you're actually compressing the file for the first time. When you go back later and try to analyse the file again., there's no way of knowing whether a compression on a >4GB file is legitimate or not because you can't compare the drive size changes.
How about simply uncompressing the file to a temporary folder and recompressing it there? This way it should be possible to see the difference as if the the file was compressed for the first time. Of course, it is time, resource and space consuming, but if there is no other way...
@tomasz1986 The problem with that is this is an issue with files >4GB, so uncompressing and recompressing those will take a ridiculous amount of time, and will thrash the disk with writes.
I've decided it will be much saner to store the compression results in a database and read from that instead, and compare it to the size that Windows says it has. This will also allow expansion in the future for automatic maintenance of folders to keep them compressed after updates.
@NekuSoul I'm wondering if you can replicate this behaviour on Windows 10 1903? I'm currently unable to do so, which I'm hoping means the Windows reporting bug is fixed but I'm not certain.
@ImminentFate I'm running 1903 and I just did a few tests both through the commandline using 'compact /C /F /S /EXE' and by using v2.4.1 of CompactGui, which was the last version before the bandaid fix was added. Filesizes over 4GB now show up correctly so it looks like Microsoft has indeed fixed the bug. ๐
@NekuSoul excellent, I'm glad to hear it.