After upgrading to v0.28.0, a bunch of my binary files are now being checked
Closed this issue · 3 comments
This appears to be due to 4af9b52, which changed the binary?
heuristic.
Now a lot (most?) of the images in our repository are not being treated as "binary". I hacked the code a bit to print out the percentage unicode its finding (greater than 0.30
means it is treated as binary):
WcaOnRails/public/images/murciaopen2008.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.158203125
WcaOnRails/public/images/netherlands2007.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.150146484375
WcaOnRails/public/images/organizations/Cube_FA_RGB_pos.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.149658203125
WcaOnRails/public/images/organizations/abcm.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.173095703125
WcaOnRails/public/images/organizations/aca.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.2041015625
WcaOnRails/public/images/organizations/canada_cubing.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.18212890625
WcaOnRails/public/images/organizations/cubing_china.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.137451171875
WcaOnRails/public/images/organizations/cubing_italy.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.171142578125
WcaOnRails/public/images/organizations/cubing_usa.svg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.028564453125
WcaOnRails/public/images/organizations/cubingtr.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.186279296875
WcaOnRails/public/images/organizations/estonian_open.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.149658203125
WcaOnRails/public/images/organizations/france.svg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.02587890625
WcaOnRails/public/images/organizations/hong-kong.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.144775390625
WcaOnRails/public/images/organizations/jrca.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.32861328125
WcaOnRails/public/images/organizations/korea.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.1474609375
WcaOnRails/public/images/organizations/nsa.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.183837890625
WcaOnRails/public/images/organizations/philippines.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.32958984375
As you can see above, most of these images are coming out to below 0.30
, which means they end up getting checked by my precommit.rb
script. If you're curious about the specific images, you can find them here.
For now, I plan to work around this by explicitly stating the extensions of our binary files in our .pre_commit.ignore
file.
Hmm... that is unfortunate. I'm merely trying to get a "best guess" implementation here based on the code in the ptools
repo https://github.com/djberg96/ptools/blob/b654c4b2cf9e30e3f1a5869afe88023513d1b0ab/lib/ptools.rb#L90
If this is problematic, I can probably revert the change that introduced this issue. After all, we're swapping one "guess" for another "guess". ¯\_(ツ)_/¯
You're totally right, it's a guess either way =) I'm definitely okay with us explicitly having to state what kind (extensions) of files in our repo need to be checked.
Released v0.32.0
which contains the original binary file check.