idrassi/HashCheck

[Feature Request] Add support for extended file format

Opened this issue · 8 comments

There is no standard hash file format, but there are a few adopted defacto standards. Fortunately, all of the likely standards are pretty cross/backwards compatible and apparent. These are the formats I use, would prefer to use, and hope that HashCheck might support.

HashCheck already supports the following:

FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
^ file merely contains a single hash value, no file path. use own filename to identify which file should be verified.

FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF *\filename.doc
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF *x:\filename.doc
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF x:\filename.doc
^ file contains one or more hash values followed by qualified relative or absolute path. verify this/these files.
(don't ask me where the asterisk comes from. wish I knew. but HashCheck seems to support with and without.)

I'd also like it to support:

FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 1234567890
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 1234567890 *\filename.doc
^ hash value followed by numeric value file size.  if the file size does not match first, verification fails.

this saves A LOT OF TIME having to verify a file when we know the filesize will not match. since
a numeric value is never a qualified path, we can always assume this means file size.

Enable the user to choose whether or not they want file sizes to be included in hash files.
When HashCheck encounters any of these formats, it can automatically detect how to handle them given alphanumeric context.

@idrassi would you be inclined to add file sizes to hash digest files?

Thank you for this proposal. it is an interesting feature and I agree that it can be helpful in some situations.
Can you share other tools that are compatible with this format? This will be helpful in testing.

I will look how to implement this.
Another question: do you have any proposal of where to put the option to activate this? I'm thinking of adding a list choice in the "Save As" dialog as shown below. This will enable the user to check the format to use for each operation, the standard format being the default (cf screenshot below).

image

I cannot attest what current tools are including filesize, except for RHash (command-line tool) which has you specify the file format of your choice, using %variable tags as in %{sha-512} %s *%p. I do know a lot of .sfv files from my USENET days did include filesize in this manner as the middle parameter, and some also had filename on the left and hash on the right. (There never has been a true standard or industry guidance that I am aware of.)

I like the added droplist. However, could you make the default Hash file format the last-used option, same as is done with Save as type, so that I don't have to select it each and every time?

With respect to the Verify functionality of reading digest files. I assume we agree that a filesize mismatch (if filesize is present) is considered a failure. Might I suggest then while rewriting the verifyer that multiple pass loops are performed on the list?

What I mean is, perform 2 passes on all items instead of just the 1 pass, so that we can enjoy early detection of simple filename and filesize mismatch failures before getting starting on any hash verification that takes time? Pass 1 verifies the filenames and filesizes, Pass 2 verifies the hash digests of any that didn't fail Pass 1. This will allow us to abort verification if we're unsatisfied with the name/size failures.

Feature creep: With a filesize, the Verify tool could easily detect filename changes and make repairs for us, if a correctly sized file exists in the anticipated location. But not if there are 2 files of the same exact size.

The program Everything by VoidTools now supports these extended formats in the current Alpha version, for reading and comparing hash digests. Everything will also generate hash values, but does not yet perform verify or save-to-digest-file features just yet. It will read existing digest files and populate column data for column sorting and duplicate detection.

Wed Oct 06, 2021 10:23 pm

Everything 1.5.0.1279a adds support for the
FFFFF 1234567890 *filename
hash file syntax.

re: Thu Sep 30, 2021 5:06 am

Everything will support:
FFFFF 1234567890 *filename
in the next alpha update.
(must be formatted exactly as <hash> <space> <size> <space> <* or space> <filename>)

Everything will only support (filename first):
filename FFFFF
in SFV files.

Everything will support (filename variations):
FFFFF *filename
FFFFF  filename (two spaces)
FFFFF filename (non standard single space)
in .md5 and .sha files.

ZPNRG commented

@a-raccoon and @idrassi, I have literally used HashCheck daily since early 2009. I use it A LOT and create and verify checksums ALL THE TIME. Anywhere from a few files to thousands of files.

I think the idea of adding file size to the created digest file is a great idea so that files that have a size mismatch are automatically skipped and marked as a "mismatch" as far as the checksum verification. I am assuming that reading and/or checking file size is drastically faster (trivial, time & resource-wise) than generating or verifying any hash, especially SHA-1, SHA-256, etc.

In regards to the droplist, I agree with @a-raccoon that the "Hash file format" should probably be the last-used option, same as is done with "Save as type", so that I don't have to select it each and every time. That said, maybe the default format could be an option one sets. Granted, that option probably isn't necessary if HashCheck remembers what format was last used and I always want the same (Extended with file size value) format. Personally, I would always use that "Extended" format if the digest size as well as creation and verification time only marginally increased because I know it would benefit in the long run.

As far as @a-raccoon's suggestion about possibly detecting filename changes and making repairs for us, I would probably vote against that idea. I would want to be very careful about having HashCheck take that type of action. Perhaps that could be an option that is turned OFF by default, but could be enabled.

As a side note, I've used VoidTools's Everything for years, though I usually run only the stable release or a beta. So, compatibility with upcoming Everything features might be nice, though that isn't priority for me. @a-raccoon, I appreciate you mentioning some of those details in your previous post.

Thanks for the glowing words. :)

... make repairs for us

And for clarity, I don't think I necessarily meant that the Verify tool should automatically rename the files for us, but rather, to compensate by detecting those files that have been renamed and tell us so. Maybe with a "Renamed" warning bubble instead of a "Missing" warning bubble.

If you have free time, and really enjoy Voidtool's Everything, I think you're going to love the Alpha beta.

It'd be nice if the hash files output from the CertUtil command (built into Windows) were supported:

Usage:
  CertUtil [Options] -hashfile InFile [HashAlgorithm]
  Generate and display cryptographic hash over a file

Options:
  -Unicode          -- Write redirected output in Unicode
  -gmt              -- Display times as GMT
  -seconds          -- Display times with seconds and milliseconds
  -v                -- Verbose operation
  -privatekey       -- Display password and private key data
  -pin PIN                  -- Smart Card PIN
  -sid WELL_KNOWN_SID_TYPE  -- Numeric SID
            22 -- Local System
            23 -- Local Service
            24 -- Network Service

Hash algorithms: MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512

CertUtil -?              -- Display a verb list (command list)
CertUtil -hashfile -?    -- Display help text for the "hashfile" verb
CertUtil -v -?           -- Display all help text for all verbs

An example command:
CertUtil -hashfile "inputFileName.mkv" SHA256 > "hashFileName.sha256"

The contents of the output file:

SHA256 hash of inputFileName.mkv:
31cfc14234201b866a0e092aa251ee4788a19ea742c046a27de1ee0eefe29ab9
CertUtil: -hashfile command completed successfully.