Investigate use of NFKD

Question

Investigate use of NFKD

steveatinfincia opened this issue 8 years ago · 6 comments

BIP-0039 suggests it needs to be applied in two situations:

When generating the wordlists

The standard says this:

The wordlist can contain native characters, but they must be encoded in UTF-8 using Normalization Form Compatibility Decomposition (NFKD).

This should be taken care of because the wordlist in bip39-rs is from the BIP-0039 repo and has already been processed correctly.

When turning a mnemonic phrase into a seed

The standard says this:

To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).

We currently make no attempt to follow this and should.

Answer 1 · 2018-12-19T08:13:26.000Z

I believe the unicode-normalization crate provides this as UnicodeNormalization:nkfd.

Answer 2 · 2019-02-12T02:06:02.000Z

I've been working on adding in NFKD normalization, need reliable test vectors in non-English languages. (I already have a Japanese set)

Answer 3 · 2019-02-12T06:52:22.000Z

I found some in the NBitcoin project. NBitcoin/NBitcoin. https://github.com/MetacoSA/NBitcoin/tree/master/NBitcoin.Tests/data

Answer 4 · 2019-02-12T09:00:08.000Z

Nice find @wigy-opensource-developer!

Answer 5 · 2019-02-12T10:11:11.000Z

The tests there were generated with https://github.com/nym-zone/easyseed

Answer 6 · 2019-02-12T15:35:48.000Z

Maybe this could be an interesting codefix: Not normalized input for Japanese phrases to test normalization: bip32JP/bip32JP.github.io@360c05a (I do not speak Japanese, so I would need to rely on these to make test vectors myself 😊 )