jish/pre-commit

After upgrading to v0.28.0, a bunch of my binary files are now being checked

Closed this issue · 3 comments

jfly commented

This appears to be due to 4af9b52, which changed the binary? heuristic.

Now a lot (most?) of the images in our repository are not being treated as "binary". I hacked the code a bit to print out the percentage unicode its finding (greater than 0.30 means it is treated as binary):

WcaOnRails/public/images/murciaopen2008.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.158203125
WcaOnRails/public/images/netherlands2007.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.150146484375
WcaOnRails/public/images/organizations/Cube_FA_RGB_pos.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.149658203125
WcaOnRails/public/images/organizations/abcm.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.173095703125
WcaOnRails/public/images/organizations/aca.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.2041015625
WcaOnRails/public/images/organizations/canada_cubing.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.18212890625
WcaOnRails/public/images/organizations/cubing_china.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.137451171875
WcaOnRails/public/images/organizations/cubing_italy.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.171142578125
WcaOnRails/public/images/organizations/cubing_usa.svg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.028564453125
WcaOnRails/public/images/organizations/cubingtr.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.186279296875
WcaOnRails/public/images/organizations/estonian_open.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.149658203125
WcaOnRails/public/images/organizations/france.svg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.02587890625
WcaOnRails/public/images/organizations/hong-kong.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.144775390625
WcaOnRails/public/images/organizations/jrca.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.32861328125
WcaOnRails/public/images/organizations/korea.png
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.1474609375
WcaOnRails/public/images/organizations/nsa.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.183837890625
WcaOnRails/public/images/organizations/philippines.jpg
((s.size - s.grep(" ".."~").size) / s.size.to_f) = 0.32958984375

As you can see above, most of these images are coming out to below 0.30, which means they end up getting checked by my precommit.rb script. If you're curious about the specific images, you can find them here.

For now, I plan to work around this by explicitly stating the extensions of our binary files in our .pre_commit.ignore file.

jish commented

Hmm... that is unfortunate. I'm merely trying to get a "best guess" implementation here based on the code in the ptools repo https://github.com/djberg96/ptools/blob/b654c4b2cf9e30e3f1a5869afe88023513d1b0ab/lib/ptools.rb#L90

If this is problematic, I can probably revert the change that introduced this issue. After all, we're swapping one "guess" for another "guess". ¯\_(ツ)_/¯

jfly commented

You're totally right, it's a guess either way =) I'm definitely okay with us explicitly having to state what kind (extensions) of files in our repo need to be checked.

jish commented

Released v0.32.0 which contains the original binary file check.