buidl-bitcoin/buidl-python

RedeemScript parse bug

Opened this issue · 9 comments

From bitcoin core v0.22:

$ bitcoin-cli -testnet decodescript 4752210223136797cb0d7596cb5bd476102fe3aface2a06338e1afabffacf8c3cab4883c210385c865e61e275ba6fda4a3167180fc5a6b607150ff18797ee44737cd0d34507b52ae
{
  "asm": "52210223136797cb0d7596cb5bd476102fe3aface2a06338e1afabffacf8c3cab4883c210385c865e61e275ba6fda4a3167180fc5a6b607150ff18797ee44737cd0d34507b52ae",
  "type": "nonstandard",
  "p2sh": "2NEfNqj5R9sTJpK94woeB8HeDZeFGESC8DS",
  "segwit": {
    "asm": "0 65f44cbbfa85676ec7a8fc31ac52bba97f7143a2176ba8fb807cca57adbec704",
    "hex": "002065f44cbbfa85676ec7a8fc31ac52bba97f7143a2176ba8fb807cca57adbec704",
    "address": "tb1qvh6yewl6s4nka3aglsc6c54m49lhzsazza4637uq0n990td7cuzq3c9ykt",
    "type": "witness_v0_scripthash",
    "p2sh-segwit": "2N8XtpEMnsef6nkt7WMjUkunwLyM3yWiaf2"
  }
}

However, in the test case we parse this hex to get a testnet address of 2MxEZNps15dAnGX5XaVwZWgoDvjvsDE5XSx, which is not found in the bitcoin core output.

@jimmysong any thoughts?

The original example is strange/nonstandard, this one is worse:

$ bitcoin-cli -testnet decodescript 522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae
{
  "asm": "2 02be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f 036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b2 038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f 3 OP_CHECKMULTISIG",
  "type": "multisig",
  "p2sh": "2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz",
  "segwit": {
    "asm": "0 3ea27f340fddab92862bd711e53ac5f3d616acef5c6a2309fd174555560661b0",
    "hex": "00203ea27f340fddab92862bd711e53ac5f3d616acef5c6a2309fd174555560661b0",
    "address": "tb1q86387dq0mk4e9p3t6ug72wk970tpdt80t34zxz0azaz424sxvxcqjc86xj",
    "type": "witness_v0_scripthash",
    "p2sh-segwit": "2N1M2j1W6cGraBo8cYNhKyKMDmHBEQ6YPV9"
  }
}

However, buidl returns something different:

>>> from buidl import *
>>> redeem_script_hex = "522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae"
>>> RedeemScript.parse(BytesIO(bytes.fromhex(redeem_script_hex))).address('testnet')
mismatch between length and consumed bytes 102 vs 82
'2NGAVkcYcayqqhTwhw2mMiBueyuxwXAphg2'

Note that this address DOES NOT match the bitcoin core address:
2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz

You can also verify this with caravan:

  1. Visit https://howech.github.io/caravan/#/script
  2. Enter 522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae
  3. Toggle testnet
  4. See that it produces 2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz

Screen Shot 2022-01-04 at 5 27 39 PM

I'm not sure I understand what the issue is. Is it the front byte that's the problem? What's the hash preimage that gives the p2sh address that buidl gives vs the one that core uses?

What's the hash preimage that gives the p2sh address that buidl gives vs the one that core uses?

buidl (on current main branch) gives 2NGAVkcYcayqqhTwhw2mMiBueyuxwXAphg2 and can't return p2sh keys:

>>> from buidl import *
>>> redeem_script_hex = "522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae"
>>> rs_obj = RedeemScript.parse(BytesIO(bytes.fromhex(redeem_script_hex)))
mismatch between length and consumed bytes 102 vs 82
>>> rs_obj.hash160()
b'\xfbe\xf8\xfc#\xb2K\xe4\xc6\x8f\xa0\x9e\xf1\x06m\xfa\t[5\x98'
>>> encode_base58_checksum(b"\xc4" + rs_obj.hash160())
'2NGAVkcYcayqqhTwhw2mMiBueyuxwXAphg2'
>>> rs_obj.address("testnet")
'2NGAVkcYcayqqhTwhw2mMiBueyuxwXAphg2'
>>> rs_obj.signing_pubkeys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mflaxman/workspace/buidl-python/buidl/script.py", line 489, in signing_pubkeys
    raise ValueError(f"Not p2sh multisig: {self}")
ValueError: Not p2sh multisig: 02be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f 036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b2 038fc14c8dd5a15828bd4fb0e2 

bitcoin core returns 2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz:

$ bitcoin-cli -testnet decodescript 522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae
{
  "asm": "2 02be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f 036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b2 038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f 3 OP_CHECKMULTISIG",
  "type": "multisig",
  "p2sh": "2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz",
  "segwit": {
    "asm": "0 3ea27f340fddab92862bd711e53ac5f3d616acef5c6a2309fd174555560661b0",
    "hex": "00203ea27f340fddab92862bd711e53ac5f3d616acef5c6a2309fd174555560661b0",
    "address": "tb1q86387dq0mk4e9p3t6ug72wk970tpdt80t34zxz0azaz424sxvxcqjc86xj",
    "type": "witness_v0_scripthash",
    "p2sh-segwit": "2N1M2j1W6cGraBo8cYNhKyKMDmHBEQ6YPV9"
  }
}

Does that make sense? Definitely seems like a buidl issue that this doesn't match.

If I work backwards using the data from bitcoin core, I can create a Redeem Script that is very close to the correct one from bitcoin core, but incorrectly prepends a 69:

>>> from buidl import *
>>> rs_obj = RedeemScript.create_p2sh_multisig(quorum_m=2, pubkey_hexes=["02be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f", "036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b2", "038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f"])
>>> rs_obj.hash160()
b'\xbe\x14l\r\xe3\x9c\xae\x08\xf4I\xfd}\xe2\xfd\x18\xcb\xdas#\x97'
>>> encode_base58_checksum(b"\xc4" + rs_obj.hash160())
'2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz'
>>> rs_obj.address("testnet")
'2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz'
>>> rs_obj.serialize().hex()
'69522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae'

If I then take this RedeemScript with two extra chars, I can successfully call parse on it:

>>> RedeemScript.parse(BytesIO(rs_obj.serialize())).address("testnet")
'2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz'

And just to further confirm, if you parse this incorrect RedeemScript hex that buidl generates in bitcoin core, you get the address 2NCoFJuE5nE7tmRadK6sAdmTFK1ctNTraZB as nonstandard (not 2NAaGqgaZicBUXuA2iZc6ssxLEsS4sZwdwz that buidl generates):

$ bitcoin-cli -testnet decodescript 69522102be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f21036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b221038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f53ae
{
  "asm": "OP_VERIFY 2 02be8d4de672b4d6149616962a7d193b702e608a1ed65aaa44f432ff9dd902252f 036ec4565fb304b0fc8e2cdd56e477816ab703e06d52f87279bf0cbdb9fa4941b2 038fc14c8dd5a15828bd4fb0e206e443e3ac6e3e3782fbf6f0a1ff969a9ec8f28f 3 OP_CHECKMULTISIG",
  "type": "nonstandard",
  "p2sh": "2NCoFJuE5nE7tmRadK6sAdmTFK1ctNTraZB",
  "segwit": {
    "asm": "0 5e045f933186d9f94ecbe9bfb1b2a21ce6c24258bf1f60780a1a8bf94af361f2",
    "hex": "00205e045f933186d9f94ecbe9bfb1b2a21ce6c24258bf1f60780a1a8bf94af361f2",
    "address": "tb1qtcz9lye3smvljnktaxlmrv4zrnnvysjchu0kq7q2r29ljjhnv8eqljh367",
    "type": "witness_v0_scripthash",
    "p2sh-segwit": "2NFg19i2ZNnpf1fYVPpaVoHGZg6P4rhJbKG"
  }
}

the length in bytes of a standard 2-of-3 multisig p2sh redeemscript is ... a valid opcode heh

len(redeem_script.raw_serialize())
> 105

OP_CODE_NAMES.get(105)
> 'OP_VERIFY'

hex(105)
> '0x69'

afaict serialize() on a RedeemScript should simply be returning self.raw_serialize() -- there is no need to call encode_varstr and prepend the result in this case. That is the source of the issue.

h/t @roshii ^

afaict serialize() on a RedeemScript should simply be returning self.raw_serialize() -- there is no need to call encode_varstr and prepend the result in this case. That is the source of the issue.

TLDR: the confusion comes from the method's name, there is no bug, implementation is correct

It's a bit more tricky than that: for serializing/parsing transaction, script must be prefixed with its length in order to determine what belongs to the script. The name of the method may be confusing as one expect the script and only the script to be serialized when calling named method. Imho and to avoid confusion, the length prefix should be added when constructing a transaction. The serialize method on RedeemScript should indeed behave as its raw_serialize - at least that is what some (many?) of us expect.