Question about using Strings
jehugaleahsa opened this issue · 4 comments
Hello! This is mostly a question about your library.
I noticed a lot of the methods either accept a string or return a string, such as addSalt(String)
. I noticed when you convert these back to byte[]
, you are using UTF 8. When I take an existing salt, that's just a byte[]
, and try to convert it to a String
using new String(bytes, StandardCharsets.UTF_8)
I always get an error about it being an invalid salt. These are salts that were generated prior to using your library.
I am curious how UTF 8 works in this situation. Normally, byte values >127 are interpreted as surrogate pairs, meaning the character is composed of 2 or more bytes. Some sequences of bytes are not valid unicode, as some ranges are reserved. I am curious how UTF 8 handles this situation and was hoping you knew off hand. Is it replacing those byte sequences with a default character (like those that appear as a square symbol or question mark diamond), or is it keeping the byte sequences as-is?
Would it be possible to create overloads of addSalt
taking a byte[]
so I could use this library? In case you're curious, when I persisted my passwords and salts to a database, I used Base64
to encode byte[]
rather than UTF 8.
Hi @jehugaleahsa ,
what algorithm are you using? bcrypt for example generates by itself the salt and cannot be specified.
Can you provide an example where you get the error?
Thank you
Sorry. I probably didn't explain that very well. Currently, I am using patrickfav's implementation of bcrypt. To generate the salt, I am using java's SecureRandom
class. It works just fine - I wasn't sure if you were saying you're implementation always generates its own salt or if that's the nature of the bcrypt algorithm in general.
I answered my own question about the UTF8 thing, btw. The unit test below took one of my salts, encoded as Base64, and converts it back to a byte array. I convert it to a string using UTF8 and back again, and confirmed the byte arrays do not match. So the Java UTF8 encoder must map invalid character sequences, so the byte[]
doesn't round-trip.
@Test
public void testEncoding() {
byte[] bytes = Base64.getDecoder().decode("CBrGvzDkT1UFDkmPQ94pOQ==");
String utf8 = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes)).toString();
byte[] encoded = utf8.getBytes(StandardCharsets.UTF_8);
Assertions.assertArrayEquals(bytes, encoded); // <--- this fails
}
Does your library avoid generating salts that would run into this issue? I am not going to be able to port to using password4j because I can't use the existing salts in my database.
There are no inconsinstencies between the two libraries as you can see
String password = "MySuperSecurePassword";
// encrypt with patrickfav implementation
String hash1 = BCrypt.withDefaults().hashToString(12, password.toCharArray());
// encrypt with Password4j implementation
BcryptFunction f = BcryptFunction.getInstance(Bcrypt.A, 12);
String hash2 = Password.hash(password).with(f).getResult();
// check with Password4j patrickfav's hash
Password.check(password, hash1).with(f); // true
// check with patrickfav implementation Password4j's hash
BCrypt.verifyer().verify(password.toCharArray(), hash2).verified; // true
What I find weird is that you had stored the salt in a separate column in you database. You must always save the hash in its original form, like $2a$12$dp7gYK/LoX1Cm4jpZXp56Onv7Bnf178GpZQVYKwaS4VZvZ0fSLcPu
.
bcrypt (but also scrypt and Argon2) integrates the salt inside the cipher text, but bcrypt has stricter rules:
- a 16-bytes salt
- salt and cipher text are encoded with a modified version of Base64
The last point probably is where you have the issue: you are using the standard Base64 decoder, while it was originally encoded with a modified one. I did't checked how patrickfav's encode the salt BCrypt.Result
, but I'm quite sure it's not using the modified Base64.
The only solution I see is that you have to implement the modified version of Base64 (see for example com.password4j.BcryptFunction#decodeBase64(String, int)
. Even if Password4j would allow salts in form of byte[]
(which is a good feature), you still have to decode from standard Base64 your salts and encode them back with the modified Base64.
I am going to close this issue. Thanks for the education - it helped. What I was missing was that bcrypt stores the hash internally. The database was designed to handle various algorithms so the hash was basically just duplicated.