Japanese test vectors for BIP39

Question

Japanese test vectors for BIP39

ryanxcharles opened this issue 10 years ago · 22 comments

I am trying to make my BIP39 implementation equivalent to yours, and I'm having problems with the japanese test vectors. Take the last test vector for instance.

The mnemonic is this:

うちゅうふそくひしょがちょううけもつめいそうみかんそざいいばるうけとるさんまさこつおうさまぱんつしひょうめしたたはついちぶつうじょうてさぎょうきつねみすえるいりぐちかめれおん

And passphrase is this:

㍍ガバヴァぱばぐゞちぢ十人十色

The test vector says the seed should be this:

346b7321d8c04f6f37b49fdf062a2fddc8e1bf8f1d33171b65074531ec546d1d3469974beccb1a09263440fc92e1042580a557fdce314e27ee4eabb25fa5e5fe

However, I get this value:

3ef879ac8a919ffc89031ba706ec49fd4ced3bf7ef20bea786c3e6705db994c29dbcd73e324d5da910f41d3770ba4c76347dc4d00f26f3a245d971ff0e32828f

I suspected that this had something to do with spaces. I tried replacing the japanese space with the english space, however that did not fix the problem.

Furthermore, when I try to run your BIP39 implementation from node, I get the same value as my code, which is this:

3ef879ac8a919ffc89031ba706ec49fd4ced3bf7ef20bea786c3e6705db994c29dbcd73e324d5da910f41d3770ba4c76347dc4d00f26f3a245d971ff0e32828f

Any ideas what I might be doing that is systematically wrong? I have no problems with the english test vectors, only the japanese ones. Thanks!

Answer 1 · 2015-01-07T00:24:19.000Z

G'day Ryan, what language are you coding in?

Answer 2 · 2015-01-07T00:53:25.000Z

Hi, I'm using javascript. My code is here: https://github.com/ryanxcharles/fullnode

I do not have a version of my code that fails the japanese test vectors, however I could put it up if you would like to see it.

Answer 3 · 2015-01-07T00:55:08.000Z

Hmmm so am I to understand that the code you posted passes the JP tests?

Answer 4 · 2015-01-07T00:56:01.000Z

No, the code I posted does not pass the JP tests. It just doesn't run them at all. I haven't put that version on github. However, I will do so, so you can see. Hang on.

Answer 5 · 2015-01-07T00:59:05.000Z

Ah ok, yes please put that up and I shall look for you.

Answer 6 · 2015-01-07T01:07:36.000Z

Here's the mnemonic2seed function which passes english test vectors, but fails japanese test vectors:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp/lib/bip39.js#L112

Here's the test code:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp/test/bip39.js#L112

Here are the vectors:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp/test/vectors/bip39.json#L173

Japanese wordlist:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp/lib/bip39.js#L2178

Japanese space:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp/lib/bip39.js#L4228

Answer 7 · 2015-01-07T01:21:03.000Z

Ok just looked at it now. The wordlist itself has to be UTF8-NFKD Normalised, this is very important and I noticed that when copying the wordlist and pasting into my code, It actually lost the Normalisation. So What I had to do was copy from the web browser into Notepad and then save the notepad txt file with UTF8 encoding then copy from that into my code. If that doesn't work, you will need to force normalisation before processing the mnemonic, are you doing Normalisation of the Mnemonic before using it by chance?

Answer 8 · 2015-01-07T04:09:30.000Z

Thanks for your help. Using vim, I copied the strings from file to file using the yank command, which as best I can tell is byte-for-byte accurate, and got the same results.

Is it possible there is a flaw in the test vectors? I used the implementation of bip39 in this library and got the same results. Consider this little node program, executed from the same directory as bip39.js:

var BIP39 = require('./bip39');
var bip39 = new BIP39('jp');
var vectors = require('../test_JP_BIP39');
var mnemonic = vectors[23].mnemonic;
var passphrase = vectors[23].passphrase;
var seed = bip39.mnemonicToSeed(mnemonic, passphrase);
console.log(seed);
console.log(vectors[23].seed);

It should output two equivalent strings. Instead, it outputs the following:

3ef879ac8a919ffc89031ba706ec49fd4ced3bf7ef20bea786c3e6705db994c29dbcd73e324d5da910f41d3770ba4c76347dc4d00f26f3a245d971ff0e32828f
346b7321d8c04f6f37b49fdf062a2fddc8e1bf8f1d33171b65074531ec546d1d3469974beccb1a09263440fc92e1042580a557fdce314e27ee4eabb25fa5e5fe

Two inequivalent strings. Which one is correct? Note that the version in this library gives the same output as my code. Either both implementations are wrong in the same way, or the test vectors are wrong.

Answer 9 · 2015-01-07T04:18:32.000Z

d4d8b3d

do you have this commit?

This commit was nevessary to make my test vectors equivalent to python-mnemonic.

Answer 10 · 2015-01-07T04:21:30.000Z

Yes, I have the most recent commit.

Answer 11 · 2015-01-07T04:21:59.000Z

Hi mate, I'm not the author of the test vectors however with the exception of the mnemonic which has been butchered from NFKD normalised to a format that has ideographic (wide) spaces, I believe the output comes from the reference code.

I was able to make my code (The .NET implementation) work with these test vectors as long as applied the proper UTF8-NFKD normalisation so I think your issue is either in the normalisation or perhaps the passphrase (be sure to normalise it too!) (You are correct byte for byte, but remember normalisation derives different bytes! so if your byte for byte pull is coming from a not normilised source, you will pull not normalised bytes). I can confirm that the words listed in the wordlist when copied from the GitHub website (usine IE and just copied and pasted into Visual Studio) do not come UTF8 -NKFD and I had to do the saving in UTF8 that I mentioned to force it!

Also, with your passphrase are you adding the word "mnemonic" in front of it as we are told to do in the spec?

Answer 12 · 2015-01-07T04:22:14.000Z

164b5d7

Also check my old (incorrect) json file in this diff.
If yours matches up with my old vectors then the problem is the normalization commit above.

Answer 13 · 2015-01-07T04:28:11.000Z

Also, check the reference implementation. It inserts ideographic spaces now.

When normalized, all ideographic spaces will change to ASCII spaces. If they are not being properly normalized by your code, perhaps look into a normalization library.

Answer 14 · 2015-01-07T04:30:47.000Z

@dabura667 oh good news about the ideographic output in the reference implementation. That is ideal!

Answer 15 · 2015-01-07T04:37:10.000Z

I just looked at your mnemonic to seed function.

Before doing new Buffer on mnemonic and the passphrase, you must normalize first.

mnemonic = mnemonic.normalize('nfkd')
passphrase = passphrase.normalize('nfkd')

Is required.

Japanese characters become completely different utf-8 bytes when run through nfkd.

Answer 16 · 2015-01-07T04:51:31.000Z

Your code gives me the correct result if I add two lines normalizing mnemon and passphrase before line 117.

Answer 17 · 2015-01-07T04:54:24.000Z

Thanks a lot - you guys were correct. I added the "unorm" library and that fixed the problem. Here are the relevant lines of code:

https://github.com/ryanxcharles/fullnode/blob/feature/bip39jp2/lib/bip39.js#L118

I was assuming that I simply had to have the correct byte-for-byte unicode characters, without realizing that the NFKD normalization process would change many more characters than just the space.

Answer 18 · 2015-01-07T04:59:33.000Z

No worries, these JP test vectors are really good for ensuring you're doing correct normalisation huh!!!

Answer 19 · 2015-01-07T04:59:35.000Z

Good call on unorm.

I forgot about it, but some browsers don't support .normalize(), best to use unorm.

Answer 20 · 2015-01-07T05:01:24.000Z

Closing as this issue is resolved.

Answer 21 · 2015-01-07T05:01:55.000Z

No worries, these JP test vectors are really good for ensuring you're doing correct normalisation huh!!!

Yes indeed!

I forgot about it, but some browsers don't support .normalize(), best to use unorm.

Yeah, some browser don't support it, and node doesn't support it either. So the unorm library is necessary.

Answer 22 · 2015-01-07T10:45:19.000Z

@dabura667 I have now reverted to using your test vectors in their pure form with the wide spaces and I have made my .NET implementation output the mnemonic for Japanese with the wide characters to match what you have done in the reference code so mine is the same :)