technion/libscrypt

Failure on Samsung6 device

demetris-manikas opened this issue · 9 comments

Hi,
in the company I work for (Crypho A.S.) we developed
a scrypt plugin for usage with the apache cordova framework
(https://github.com/Crypho/com.crypho.plugins.scrypt) based on libscrypt.

Everything was smooth until we executed our tests on a Samsung6 mobile.
Libscrypt did not produce the expected results on any of the test cases.
After digging into the code I ended up pining the problem in the salsa20_8 method.
Just adding a printout of the x local variable "solved" the problem. Nasty, I know...
My next move was to declare the variable x as volatile and this also "solved" the issue.

I am not fully convinced that my solution makes sense so I would appreciate any feedback.

Hi,

I'm very interested in getting this solved, but I'm limited in my development experience on the mobile market. The Salsa20/8 method is directly from the reference implementation, so I'm hesitant to modify it without further advice.

Given the "volatile" keyword's impact is to prevent the compiler from attempting to optimise this keyword, I feel it's highly likely you've found a compiler bug. Does the behaviour continue if you compile with optimisations disabled ( -O0) ?

I'll continue to look into this.

Thanks for the prompt reply. It is my feeling too that the problem lies in the compiler. I 'll try compiling with the suggested option. I 'll keep you informed of any results.

So... I have news for you.
I replaced the code in crypto_scrypt-nosse.c with the code from
https://github.com/Tarsnap/scrypt/blob/master/lib/crypto/crypto_scrypt-ref.c
making the minimal changes to pass compilation.
The changes I made are the following:
a) I removed #include "scrypt_platform.h" directive,
b) changed #include "crypto_scrypt.h" to #include "libscrypt.h"
c) changed calls to PBKDF2_SHA256 to libscrypt_PBKDF2_SHA256

and the tests pass on the Samsung6.

There are "minor" differences between the two implementations by I don't have the time nor
the knowledge to point to the ones that do make the difference.

Many thanks! I'll get some diffs together and sort this out.

Hi. Any news about this issue?

I've been up and down the changes you made recurrently, and the difficulty is I'm only seeing another way of representing the same thing.

It's a significant code change to implement here without fully understanding exactly what part of it matters, and without having the ability to debug a Samsung device and find out where this issue lies I'm reluctant to make such a change - I'd be more worried about breaking something on a server OS.

I'm sorry I'm can't progress this, and the best I can do here is document it accordingly. I'm going to update the readme shortly to refer to this issue as known. I appreciate that's probably not the ideal for you, but I have to weigh up the likely impact of a completely changing the implementation out, for what is likely the only Samsung implementation.

Well understood. Thanks for your efforts and time.

I don't have a relevant device either, but it looks like undefined behavior due to the type casting in blkcpy might be to blame. If that is the case, passing -fno-strict-aliasing to the compiler should fix it.

I'm sorry I'm can't progress this, and the best I can do here is document it accordingly. I'm going to update the readme shortly to refer to this issue as known. I appreciate that's probably not the ideal for you, but I have to weigh up the likely impact of a completely changing the implementation out, for what is likely the only Samsung implementation.

I know this is an old issue but it's not only Samsung devices that are affected. It seems to be the case for every arm64 device. A bunch of my users got affected and were unable to receover their cryptocurrency as it relies on this lib for scrypt: https://www.reddit.com/r/lunary/comments/7nuyv3/wallet_backups_cannot_be_decrypted_by_anything/?utm_content=title&utm_medium=hot&utm_source=reddit&utm_name=lunary
I'm no expert on that matter but the interesting thing is that my local builds were not affected and the arm64 binary worked fine. Only builds completed by the fdroid server seem to have this issue. Something in their environment messed up the arm64 binaries, unfortunately I don't have any specifics on their environment. So it may be a compiler bug?