multiformats/py-multihash

multihahs.is_valid returns false unexpectedly

Closed this issue · 4 comments

  • py-multihash version: latest from pypi (pip install py-multihash), I think: see #4
  • Python version: Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0]
  • Operating System: linux

Description

I think mh = b'122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8' should be a valid multihash, but multihash.is_valid(mh) returns otherwise.

(am I wrong?)

What I Did

Wrote two tests that fail unexpectedly...

import multihash                                                                                  
import base58

def test_is_vald_multihash():                                                          
    # from https://multiformats.io/multihash/#examples                                                  
    # e.g. blake 2s, 128 bits: b'b250100a4ec6f1629e49262d7093e2f82a3278'                                
    # sha2-256, 32 bits                                                                                 
    mh = b'122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8'  # noqa
    assert multihash.is_valid(mh)  # surprising!

def test_is_vald_multihash_from_b58():
    mh = b'122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8'  # noqa
    b58_enc_mh = base58.b58encode(mh)
    # convert the b58 encoded multihash from bytes into string
    str_b58_enc_mh = "".join(chr(x) for x in b58_enc_mh)

    assert multihash.is_valid(
        multihash.from_b58_string(str_b58_enc_mh))
    # well, that explains why my code isn't working...
# hacking the tests above...
>       raise Exception(multihash.decode(mh))  # DEBUG

tests/domain/test_URI.py:129: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

multihash = b'122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8'

    def decode(multihash):
        """
        Decode a hash from the given multihash
    
        :param bytes multihash: multihash
        :return: decoded :py:class:`multihash.Multihash` object
        :rtype: :py:class:`multihash.Multihash`
        :raises TypeError: if `multihash` is not of type `bytes`
        :raises ValueError: if the length of multihash is less than 3 characters
        :raises ValueError: if the code is invalid
        :raises ValueError: if the length is invalid
        :raises ValueError: if the length is not same as the digest
        """
        if not isinstance(multihash, bytes):
            raise TypeError('multihash should be bytes, not {}', type(multihash))

        if len(multihash) < 3:
            raise ValueError('multihash must be greater than 3 bytes.')

        buffer = BytesIO(multihash)
        try:
            code = varint.decode_stream(buffer)
        except TypeError:
            raise ValueError('Invalid varint provided')

        if not is_valid_code(code):
>           raise ValueError('Unsupported hash code {}'.format(code))
E           ValueError: Unsupported hash code 49

But it's not hashcode 49, it's hashcode 18 (0x12 == sha2-256).

Hello @monkeypants,

This hexadecimal string:

122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8

is indeed a valid multihash (with hashcode 18, sha2-256). What's wrong is the way you're passing it to the multihash.decode() function, using b''

What you need to do is convert the hex string into bytes by using bytes.fromhex()

bytes.fromhex('122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8')
b'\x12 A\xdd{dCT.up\x1a\xa9\x8a\x0c#YQ\xa2\x8a\r\x85\x1b\x11VM \x02*\xb1\x1d%\x89\xa8'

And then of course py-multihash has no issue decoding that

multihash.decode(bytes.fromhex('122041dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8'))
Multihash(code=18, name='sha2-256', length=32, digest=b'A\xdd{dCT.up\x1a\xa9\x8a\x0c#YQ\xa2\x8a\r\x85\x1b\x11VM \x02*\xb1\x1d%\x89\xa8')

Take care.

There's actually a function called multihash.from_hex_string() which does exactly the same.

thanks!