duschang27/crypto-js

Clarify PBKDF2 documentation

Opened this issue · 13 comments

I would appreciate some clarification of the PBKDF2 functionality as stated in 
the docs.

1. What SHA does this implementation use? I assume SHA1(128), but I know other 
implementations use SHA256 or SHA512. Is there a reason for the specific 
choice, or is there a configuration that can control which SHA to use?

2. The "keySize" parameter is a bit confusing. If I want a 32-character key 
string out of the function (because I want to use it as the encryption key in 
another library, which requires a key of length 32), I seem to have to pass the 
number 4 as "keySize", but that seems strange to me.

What does that number represent with respect to keySize? I would have expected 
to pass 32 (because "key size" means, to me, length of the returned key string).

But if it instead represents number of bits of key length, then I would have 
expected to need to pass something like 256 (for 8-bit characters) or 512 (for 
16-bit characters). So that doesn't fit.

If it instead represents 4 bytes worth of key size (in bits), then it makes 
sense that 4*8 = 32, but then that's 32-bits of key information, which I'd 
expect to get 4 or 2 characters out, not 32. What I'm getting out is 32 
characters, which is a LOT more than 4-bytes/32-bits of key information.

----------

Any assistance you can give in clarifying these confusions is greatly 
appreciated, especially if the documentation can be updated accordingly.

Original issue reported on code.google.com by get...@gmail.com on 5 Nov 2013 at 3:52

OK, I think I've answered my questions after having read through the source 
code for awhile. Please correct me though if I'm wrong here:

1. SHA1 is the default. You can pass the "hasher" option to pick others, like 
SHA256 or SHA512. For example: `hasher: Crypto.algo.SHA256`

2. "keySize" refers to the count of 32-bit words in the key. Me passing `4` 
means then 4*32 = 128-bits of key information.

Now, the reason I was getting out the 32-character length string is because the 
default `toString()` is Hex, so the Hex string encoding of a 128-bit value is 
32 hexadecimal characters, because it takes 4-bits to represent a hexadecimal 
character, so 128/4 = 32-characters.

-------------

Am I correct? If so, it would be really nice to see some of this detail more 
explicit in the documentation. It took a few hours of scratching through the 
code to come up with what could have been consumed in a few short 
sentences-worth of documentation. :)

Original comment by get...@gmail.com on 5 Nov 2013 at 5:43

Now, my further questions which I haven't yet determined from code spelunking:

1. I generate a random salt like this:

    salt = CryptoJS.lib.WordArray.random(16);

This appears to create 16-bytes (16*8 = 128-bits) of random bit data, which of 
course can be hex-encoded (default) to a 32-character string.

It appears you can pass the `salt` in either as the WordArray object, or as a 
string. If I pass in the Hex-string version of the salt in, I get a very 
different result out than if I pass in just the WordArray object. I'm not sure 
I understand what string encoding (since clearly not Hex) I could take the salt 
to so that the outcome of PBKDF2 would be the same as if I had passed in the 
original WordArray object? What am I missing here?

2. Possibly related to #1, I'm experiencing a problem when I try to take the 
WordArray and encode it to a Utf8 string, using the `Crytpo.enc.Utf8` encoder. 
However, it throws an error every time I try to do so, and it seems to come 
down to the use of `decodeURIComponent(..)`, which throws "Error: Malformed 
UTF-8 data". 

This error only happens when I try to format the output of PBKDF2() as Utf8. If 
I encode the generated salt as Utf8, it works fine.

What's going on here?

I need to be able to persist the hash data as a string, in JSON in 
localStorage, so I would prefer to use Utf8. This hash I'm creating is used as 
a key for both an encryption and decryption task at a later time, and those 
crypto APIs (a JS port of NaCL) need a Utf8 string for the key (technically 
they require a Uint8Array, which they provide a method to get from a 
Utf8-string).

I'd appreciate some guidance on what I'm doing wrong here. And, I'd appreciate 
it if the documentation made such details clearer.

Original comment by get...@gmail.com on 5 Nov 2013 at 6:29

Correction: I am *not* able to run Utf8 on the salt as I incorrectly said 
above. Both the salt and the hash throw the same "Malformed" error.

Original comment by get...@gmail.com on 5 Nov 2013 at 6:35

> If so, it would be really nice to see some of this detail more explicit in 
the documentation.

You're absolutely right. I'll try to find time to get better information up 
there.

> If I pass in the Hex-string version of the salt in, I get a very different 
result out than if I pass in just the WordArray object.

Correct. A WordArray object is binary data. Whereas a hex string is an ASCII 
representation of that data. The two *should* produce different hashes.

> I'm not sure I understand what string encoding (since clearly not Hex) I 
could take the salt to so that the outcome of PBKDF2 would be the same as if I 
had passed in the original WordArray object?

If you pass in a string, then by default, the characters of that string are 
converted to UTF8 bytes. If, on the other hand, you want to pass in binary 
data, then you need to pass it as a WordArray object.

> I'm experiencing a problem when I try to take the WordArray and encode it to 
a Utf8 string, using the `Crytpo.enc.Utf8` encoder. However, it throws an error 
every time I try to do so

If you're trying to treat random bytes as though they're UTF8 bytes, then yes, 
that can and should fail. Not every byte sequence is valid UTF8.

> I need to be able to persist the hash data as a string, in JSON in 
localStorage, so I would prefer to use Utf8.

You definitely don't want to use UTF8 for that. It isn't meant to represent 
binary data. Latin1 is a possibility, because there's a 1-to-1 mapping between 
bytes and characters. Though, it's safer to use an encoding that uses only 
printable characters, such as hex or base64.

Original comment by Jeff.Mott.OR on 5 Nov 2013 at 7:51

@Jeff.Mott.OR

Thanks for your clarifications. I was able to work out a number of my issues 
via this thread: https://github.com/tonyg/js-nacl/issues/17

What I ended up writing these conversion functions: 
https://gist.github.com/getify/7325764#file-gistfile1-js-L5-L16

They convert a WordArray to a Uint8Array and vice versa, which helps get the 
PBKDF2 hash as the encryption key into the nacl encryption function (which 
expects Uint8Array's).

Then I wrote these: 
https://gist.github.com/getify/7325764#file-gistfile1-js-L18-L32

Which help turn a `Uint8Array` into a binary-encoded string (aka, base-128 I 
think?) and vice versa, which help store the data into a string.

When I want to go from a WordArray to a binary-encoded string (for 
persistence), I use two together, like:

    convertUint8ArrayToBinaryString(
       convertWordArrayToUint8Array(salt)
    );

But, TBH, it'd be nice if CryptoJS provided some utility like 
`convertWordArrayToBinaryString(..)`. For better or worse, my lack of 
understanding + the documentation led me to believe that `CryptoJS.enc.Utf8` 
would do that, which clearly it doesn't.

Thanks again!

Original comment by get...@gmail.com on 5 Nov 2013 at 10:39

What you refer to as a binary string is actually binary data represented as 
Latin1 characters, and yes, CryptoJS provides that.

salt.toString(CryptoJS.enc.Latin1)

Original comment by Jeff.Mott.OR on 5 Nov 2013 at 11:01

Also, you need to be careful about using Uint32Array. The reason is because 
it's endianess can vary from one machine to the next.


var wa = CryptoJS.lib.WordArray.create([0x31323334, 0x35363738]);

console.log(wa.toString(CryptoJS.enc.Latin1)); // "12345678" as expected

console.log(convertUint8ArrayToBinaryString(convertWordArrayToUint8Array(wa))); 
// "43218765" because each 32-bit number is little-endian

Original comment by Jeff.Mott.OR on 5 Nov 2013 at 11:11

@Jeff.Mott.OR --

Ugh. Good point.

So, how would you recommend I go from a WordArray to a Uint8Array without 
mis-matching endianess?

Original comment by get...@gmail.com on 5 Nov 2013 at 11:25

You'll either have to rely solely on Uint8Array and process one byte at a time, 
or get fancy with the DataView type 
(http://www.khronos.org/registry/typedarray/specs/latest/#8) which lets you 
specify little- and big-endian in the arguments. Either way, you'll have to use 
a good old fashioned for-loop. You won't be able to treat it as array-like.

Original comment by Jeff.Mott.OR on 5 Nov 2013 at 11:39

Thank you for the pointers. I will look into that. In my case, my encryptions 
never leave the machine they're created on, but I still want to make sure I 
have a robust and complete solution, so I will have to figure it out.

Original comment by get...@gmail.com on 5 Nov 2013 at 11:42

@Jeff.Mott.OR

Clarification: does WordArray.create(..) (and thus, PBKDF2) produce an 
WordArray array of big-endian numbers? Your docs don't say anything about the 
endianess of the data produced (or expected, for that matter). Does PBKDF2 
expect a WordArray for the salt parameter that is BE as well? Do all the 
encoders expect WordArray arrays of BE numbers?

Original comment by get...@gmail.com on 6 Nov 2013 at 3:56

Yes. Everything in CryptoJS is big-endian.

Original comment by Jeff.Mott.OR on 6 Nov 2013 at 4:27

@Jeff.Mott.OR

FYI: thanks to @creationix for his help, I was able to get methods which take 
the big-endian WordArray and faithfully produce a Uint8Array accounting for 
endian-ness, and do the reverse to take a Uint8Array and produce a BE WordArray.


https://gist.github.com/getify/7325764#file-gistfile1-js-L5-L40

In your above example, my code now produces "12345678" same as the latin 
encoder does.

---------

If you're looking or open to adding some additional helpers, I would suggest 
the two functions I've now linked to, which can take your BE WordArrays and 
produce Uint8Arrays, and vice versa. Those might make it easier to interop this 
lib with other libs that use typed-arrays as their exchange format, so that 
others don't go through the pain that I have. :)

Also, general comments about endianess as it relates to this lib would be nice 
to add to the docs.

Thanks again for your help and clarifications!

Original comment by get...@gmail.com on 6 Nov 2013 at 5:28