ColumPaget/Hashrat

The return of hashrat -p64 is not compliant with rfc 4648

jvw1954 opened this issue · 4 comments

According to the manpage hashrat -64 returns a base64 encoded hash and hasrat -p64 a base64 encoded hash with a-z,A-Z and _-, for best compatibility with 'allowed characters' in websites.

According to rfc 4648 the return of these two commands should be the same, except for the values 62 '+' and 63 '/'. In the url safe variant these values are '-' and '_'. Onfortunately the returns are in practice very different, for example: ywu0vq367WZKEvDcqm6dwB/8wh88muIbHJJdV0uIftg= and mkiojeruvLO-4j3ReauSk1zwkWwwai8Q799SKoi8UhV.

Calculations with Python3 confirm the first (-64) return of hashrat: ywu0vq367WZKEvDcqm6dwB/8wh88muIbHJJdV0uIftg= and ywu0vq367WZKEvDcqm6dwB_8wh88muIbHJJdV0uIftg=. These calculations appear compliant with rfc 4648.

jaap@laptop:~$ wget https://www.rfc-editor.org/rfc/pdfrfc/rfc4648.txt.pdf
--2020-04-18 19:03:25-- https://www.rfc-editor.org/rfc/pdfrfc/rfc4648.txt.pdf
Herleiden van www.rfc-editor.org (www.rfc-editor.org)... 4.31.198.49, 2001:1900:3001:11::31
Verbinding maken met www.rfc-editor.org (www.rfc-editor.org)|4.31.198.49|:443... verbonden.
HTTP-verzoek is verzonden; wachten op antwoord... 200 OK
Lengte: 24612 (24K) [application/pdf]
Wordt opgeslagen als: ‘rfc4648.txt.pdf’
rfc4648.txt.pdf 100%[=================================================>] 24,04K 159KB/s in 0,2s
2020-04-18 19:03:26 (159 KB/s) - '‘rfc4648.txt.pdf’' opgeslagen [24612/24612]
jaap@laptop:~$ hashrat -sha256 -64 rfc4648.txt.pdf
hash='sha256:ywu0vq367WZKEvDcqm6dwB/8wh88muIbHJJdV0uIftg=' type='file' mode='100644' uid='1000' gid='1000' size='24612' mtime='1161046850' inode='10753551' path='rfc4648.txt.pdf'
jaap@laptop:~$ hashrat -sha256 -p64 rfc4648.txt.pdf
hash='sha256:mkiojeruvLO-4j3ReauSk1zwkWwwai8Q799SKoi8UhV' type='file' mode='100644' uid='1000' gid='1000' size='24612' mtime='1161046850' inode='10753551' path='rfc4648.txt.pdf'
jaap@laptop:~$ python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> import base64
>>> file = open("/home/jaap/rfc4648.txt.pdf",mode='rb')
>>> data = file.read()
>>> sha256_digest = hashlib.sha256(data).digest()
>>> sha256_b64 = base64.b64encode(sha256_digest)
>>> sha256_b64_urlsafe = base64.urlsafe_b64encode(sha256_digest)
>>> print(sha256_b64)
b'ywu0vq367WZKEvDcqm6dwB/8wh88muIbHJJdV0uIftg='
>>> print(sha256_b64_urlsafe)
b'ywu0vq367WZKEvDcqm6dwB_8wh88muIbHJJdV0uIftg='
>>> quit()

The testhashes in https://github.com/ColumPaget/Hashrat/blob/master/check.sh are also not rfc 4648 compliant:

TestHash 64 "base 64 encoding" aOiOe0ag+9ilTIky0qlxDQ==
...
TestHash p64 "'website compatible' base 64 encoding" "PDXDToPVyxX_I8Zmoe_l3F"

The problem is probably in https://github.com/ColumPaget/libUseful/blob/master/Encodings.h:

#define BASE64_CHARS "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
...
#define PBASE64_CHARS "0123456789-ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

This last line is not rfc 4648 compliant. It must be:
#define PBASE64_CHARS "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"

See also: ColumPaget/libUseful#3

Hi jvw1954. I don't think I originally made the -p64 option to be rfc4648 compliant, I didn't even know that rfc existed, I think. I just found there were some issues with websites and came up with a schema to handle that. But there's no reason why I couldn't add another schema with the config you've given it here, and maybe call it '-r64' or '-rfc4648', or both! I'll do it and release a new version shortly.

New version of Hashrat (v1.13) is now out with the -r64 (or also the -rfc4648) option.