besser82/libxcrypt

Feature Request: Linux Kernel Crypto API User Space Interface support.

Closed this issue · 9 comments

https://www.kernel.org/doc/html/v4.14/crypto/userspace-if.html

I don't know how much of libxcrypt/libcrypt functionality is already exposed from the Linux Kernel Crypto API. But I do know like all the SHA algorithms are in the Linux Kernel with hardware acceleration support and the SHA algorithms here are not using any of the newer instructions to hardware accelerate.

libcrypt is being deprecated from glibc due to lack of maintenance.

This project could come another openssl without enough personal to perform proper mathematical certification that everything works and not enough personal to take advantage of hardware acceleration on different architectures.

This is why I am suggesting look at the Linux kernel crypto API and possible other operating systems Crypto API that are highly maintained providing hardware acceleration and make a libcrypt wrapper on top of those that can be used as an option and maybe as a route to pick up more developers.

When something has been deprecated duplicating with more features done exactly the same way risk walking into the same issues that caused that something to be deprecated in the first place.

Thank you for your recommendation! =)

This one will be on the roadmap for the v5.0 release (maybe based on libkcapi). Currently we are doing the final testing for releasing v4.0 and pioneering libxcrypt into the first Linux distribution.

zackw commented

I want to point out that the cryptographic primitives used in libxcrypt are modified DES, SHA-256, SHA-512, and blowfish; it may not work to use "normal" implementations (maintained by someone else or not). Also, password hashes are deliberately made slow, so hardware acceleration is actively undesirable.

The maintainability concerns are heard and valid (says the guy who hasn't been able even to respond to email about this project for two months) but the primary maintenance burden for this code is not the crypto itself, but keeping up with new algorithms - which requires design brain, not coding brain. Deciding whether to incorporate e.g. scrypt or argon is harder than coding it up, and that wouldn't change if someone else provided the implementation (as, indeed, people have).

What would make the kernel crypto API, specifically, interesting as a substrate for this project, IMO, is if it enabled better data protection for the cleartext password. In principle, the input drivers could feed the cleartext password directly into some sort of secure enclave from which only the hashed password would emerge. That's difficult on many levels but still a desirable goal.

Thing to remember the Linux Kernel Crypto API has to review everything the use. If you can work with the bigger projects and possible share more code will expand you poll of personal to review stuff.

For some things crypt function running faster would not matter. But even for those where the crypt function required to be slow externally being slow internally is bad.

"Also, password hashes are deliberately made slow, so hardware acceleration is actively undesirable."
zackw I know the logic but it does not really past any more.
http://openwall.info/wiki/john/GPU
Those attacking will implement what ever you have using GPU and other hardware acceleration. If you want password hash processing to be slow put a sleep in. Sleep in code makes way more sense as the OS can reallocate the CPU time for something else. This form of sleeping could be another very good reason to go kernel side so the kernel scheduler can go hey here is processing a password lets go run something else on other thread to randomise how much time it takes.

Deliberately slow method is eating power without any justification there are other ways of being slow to the application without wasting CPU power that in effect is wasting electricity.

The idea of deliberately slow in the crypt function goes back to when pthread and the like was in usage and people were not producing accelerated forms any more. Its not like those attempting to crack most hashes are going to limit themselves to what libcrypt/libxcrypt is going to provide.

This is like digital rights management logic where it ends up hurting the valid consumer and the thief is having all the nice advantages including better performance/playback.

So you do need to think not copy the historic ways exactly. I have no valid reason why not to use the fastest form of processing possible for password hashes with a fixed hard coded stall so return time from crypt function totally independent to processing. While application is stalled waiting for a timer event cpu can be running a different thread so better cpu utilisation than using a deliberately slow algorithm. .

Coming back to this topic after quite a while…

I've though a long time about how to handle things and the way to go for libxcrypt… From my POV it would be an advantage to drop all internal implementations of low-level hashing from libxcrypt and implement a dispatcher, which proxies them from either libkcapi (Linux Kernel Userspace Crypto API), {Open,Libre} SSL, libnss and Nettle. The choice for one of those backends would be made during configure.

Going this way would require less man-power and makes it easier to implement new hashing algorithms.

Thinking about FIPS: This would also get libxcrypt out of the requirement to be FIPS certified itself, when it is used in such an enviroment.

What do you think about this approach?

zackw commented

@besser82 My big concern with any scheme to outsource crypto primitives to any external library is that I'm not convinced it will be worth the effort, for two reasons. First, we could at best expect to get rid of the alg-*.c files -- but that's less than half the library, and it's the half that needs the least maintenance. Second, the newer hashes, the ones we actually want people using -- bcrypt, scrypt, yescrypt, argon, etc (in the happy future where we get lots of new hashes) -- they are built around bespoke crypto primitives that we won't be able to outsource. For instance, bcrypt is built around a modified blowfish, which means the implementation of blowfish in openssl's libcrypto, or any other such library, would be useless to us.

I'm with @zackw on this.

I'm just a bystander as it relates to libxcrypt project, but I think this issue was created based on misunderstanding of how modern CPUs tend to make accelerated crypto instructions available (directly to userspace as well, so e.g. OpenSSL's libcrypto would be more relevant than kernel APIs) and of "slow" password hashing. Thus, I think this issue should be closed to avoid further distraction.

[It could sort of make sense to switch sha256crypt to use latest/future x86 CPUs hardware SHA-256 instructions, but sha256crypt is a poor choice for new passwords anyway, for many reasons.]

I am going to go ahead and close this, for the reasons I stated above -- the hashing methods that can directly benefit from existing hand-optimized and/or hardware-accelerated cryptographic primitives are the obsolete hashing methods, which we do not want to spend any more engineering effort on than absolutely necessary.

The hashing methods that we actually want people using (at present, yescrypt is the most important one) could benefit from assembly-level optimization; for example, yescrypt already uses x86-64 vector instructions, when available, to accelerate its use of the Salsa20 round function, and might benefit from similar code to accelerate its use of the SHA256 round function. But these are the round functions, not the complete, standardized Salsa20 and SHA256. So any such optimization would have to be done custom for libxcrypt. OpenSSL's EVP API (for example) would be completely useless to yescrypt.

I agree with @zackw about the overall reasoning and about closing this issue. Some specific detail about yescrypt is wrong, but not in a way affecting the decision-making here.

Just for others lurking in here and maybe learning things:

yescrypt spends almost all of its time in its pwxform component, where it in fact "already uses x86-64 vector instructions, when available, to accelerate". Its usage of the Salsa20 round function corresponds to between 1.5% (with libxcrypt current defaults) and 6% of intermediate data it processes, so accelerating it is not so important, but nevertheless that is already done too. Its usage of SHA-256 corresponds to under 1% of running time, and that is not currently accelerated. This is complete and standard SHA-256, so library code could be used there, but the overall gain would be negligible - not worth the complexity and risks of supporting more than one implementation.

Thanks for the correction; I misunderstood the code as using reduced round SHA256.