hashbang/os

Automated Kernel CVE Patching

SkewedZeppelin opened this issue · 10 comments

ypid commented

I am not sure if applying patches for just CVEs is needed. To my knowledge device kernels are based on long-term Linux kernel versions, for example 4.9. Ref: https://www.kernel.org/

Why can’t we just apply the kernel patch releases? Consider that this project has a very narrow scope of device support.

@ypid
Even for devices that are updated to the latest patchlevel there are still a number of patches that can still be applied due to the following:

  • not mainlined yet
  • not being pushed to stable yet
  • due to being AOSP/Qualcomm specific and not included yet

Also the patch database includes the incremental patches so in some cases it can update straight to those for you.
https://github.com/Divested-Mobile/kernel_patches/tree/master/0001-LinuxIncrementals

Finally the patch database includes a number of hardening patches from GrapheneOS that can be applied.

ypid commented

As you mention GrapheneOS, have you reported there also? My goal is to use the kernel repos from GrapheneOS as upstream so your issue would better fit to GrapheneOS in my eyes.

@ypid
I do not think @thestinger is intrested in this method of patching.
Which is understandable.
See GrapheneOS-Archive/kernel_google_marlin#1 from October 2019

I did make this primarily with older/unsupported devices in mind.

ypid commented

I did make this primarily with older/unsupported devices in mind.

Even thought I don’t like the short (3 years) support policy of Google devices, both me and https://github.com/lrvick are not interested in supporting older/unsupported devices. Same as GrapheneOS.

I am not involved in kernel development to keep up with applying those patches. But I will keep this issue open for the other Hashbang devs to take a look. What I am interested in is the Android build system and further automating and securing that. I see you have experience with that (ref: https://divestos.org/index.php?page=devices&base=LineageOS). If you want to help us with this, I would appreciate it very much :)

That is understandable.
Thank you for taking the time to respond.

For fun: https://gist.github.com/SkewedZeppelin/6041d368d9b27aaa8ac79b58230605e1

The main issue with it in this form is that it cannot be realistically reviewed. I think the approach in https://github.com/nathanchance/stable-patches is the best way to approach the patches included in the stable releases. That should be the base for this. It's still very difficult to review and reviewing patches without the full context or understanding of the code is likely to lead to many issues slipping past. The issues created by this would often be worse than what gets addressed by it. It requires having a team of people focused on doing this for a small set of kernels. It needs adequate time and care devoted to it, along with handling time sensitive migrations to new releases. Since most kernel vulnerabilities do not receive a CVE, focusing only on those is a mistake, and we should be sharing as much of the work as possible with the others that are working on it. This starts with applying the stable kernels faster than the pace they're applied upstream (or the delay needs to be resolved upstream) and then working from there.

In order to accept a massive set of patches, it has to be possible to review what has been done and to reproduce it. There has to be a commitment to maintaining it and dealing with the fallout. That includes time sensitive release cycles where it needs to be ported over and made compatible with new upstream kernel releases, otherwise it's just going to be dropped to move forward.

Separately from that, we cannot release GrapheneOS for devices where there are serious known vulnerabilities and no path to fixing them. The path to longer support is not misleading users into thinking we are looking after their security when we can't and are putting them at serious risk. Using an end-of-life device with a very incomplete set of patches applied gives a false sense of security. It doesn't matter to an attacker if many of the issues are fixed if the vulnerabilities they use are still present. We're more likely to address this by adding an annoying warning that cannot be disabled when a device has fallen behind on the patch level. Longer support for devices requires devices offering it.

We used to have all of the stable kernel patches promptly applied, and we had a lot more hardening. We used to have the entire PaX patches applied to our kernels. Others have not stepped up to fill in for me doing all of this myself. There are currently hardening features such as slab canaries which have been ported to the current kernels but uncover issues which have not yet been resolved. Cannot really expect to restore all of the past hardening if we cannot resolve a few minor issues blocking small standalone features from being used. Similarly, there are issues with incompatibilities between certain upstream hardening features like KPTI and AOSP hardening work like ShadowCallStack which need to be resolved promptly. What we need is for people to step up to start doing this work, rather than dumping a huge workload in our lap with no realistic way to merge any of it.

If you're unaware, GrapheneOS is under a substantial sustained attack from James Donaldson / Copperhead. This has taken away a lot of my time, money and has destroyed my productivity. Other GrapheneOS developers and supporters are also being targeted, and there is an ongoing effort to cause substantial harm to the project. This has resulted in not being able to do everything we used to do. Our changes are much less extensive than in the past due to this. That doesn't just apply to the kernel. A lot of projects are languishing without much active development or improvements. A project like hardened_malloc can survive without active improvements happening, but it's not nearly as good as it could be if the resources were there to develop it further. Projects involving forks of upstream code are a different story.

Changes in forks cannot be done without having the resources to maintain them. They'll just end up disappearing over time. What we need is a script for cherry-picking the stable kernel patches one-by-one with conflict resolutions. We need to have people doing careful review of testing of it, addressing any of the issues that come up even when it's very difficult to deal with them. A good starting point would be addressing incompatibilities / upstream bugs which have regressed hardening such as slab canaries, KPTI, etc. Those are broken because of stable kernel updates from kernel.org and Qualcomm. Those updates introduce problems and since they're not testing with our hardening, we're likely to uncover things not caught by their much more extensive testing.

Look at our recent migration to Android 11. We've had it in Beta for a while and we're only just pushing it out as a Stable channel release today. Most of the SELinux policy hardening is not ported. 2 important kernel hardening features were lost for the Pixel 4 due to incompatibilities with other hardening / changes in AOSP. That's with 2 full time developers including myself along with about 6 other developers putting in substantial time. Our hands are full already. We need more help if we're going to expand the scope of the project to what it was before. I'm not able to put in the time I was before and I'm not as productive as I was able to be. If the non-technical issues hurting the project were largely addressed, things could get much better, but it won't change that a lot of help is needed and it really can't be relying so much on one person in the first place to be sustainable.

When we have the choice between doing a release without changes like this or holding back the upstream security updates, we're going to choose to drop it. That's why we don't have these kinds of changes anymore. There was not a team to put in the substantial time to keep them there. We don't want a pull request doing it. We need a long-term commitment from a team of people with adequate resources to do it right, and an approach that's relatively safe and sustainable. Existing serious issues need to be addressed first. The first thing that should probably be done is fixing the memory corruption uncovered by slab canaries followed by fixing the side channel mitigations. That stuff was broken by the stable merges by AOSP. We really can't just blindly apply a ton of patches we haven't reviewed and don't really understand. It will take time to get this done and it will be an ongoing project to keep it going. I don't really see a reason to rush into it. I want it done right, and with the people in place to support it over the long term. I won't take a short term approach to it. GrapheneOS is still very much in the process of being revived and until we've fully dealt with James Donaldson / Copperhead, our focus (especially mine) cannot be entirely on the technical side of things. Help is desperately needed, and with a lot more than just the technical side of it.

So, what I want to see is someone fixing those existing problems, and building the trust needed to believe them when they say they are going to help maintain a huge project. I think this needs more than one person anyway.

Look at our old changelogs and you can see that applying all the stable kernel changes is something we used to do:

https://github.com/AndroidHardeningArchive/legacy_changes/blob/master/oreo/changes.2018.04.19.04

We'd then compare our work to the work done by Nathan Chance and look into why we sometimes ended up with different approaches. It was nice to at least have a 2nd opinion on it, and additional testing / review. It led to a lot of additional work due to the problems it causes and we really didn't have the resources to deal with it. Most of the security fixes don't get CVEs. A lot of security fixes without CVEs are applied in the stable branches, but at the same time, the upstream review and testing is very poor. We can't afford to introduce issues like ext4 data corruption, etc. These issues are not uncommon with those patches. It's largely 1 person (Greg KH) mostly blindly applying tons of changes. Greg KH is good at it, but he's human. We pretty much need to apply more resources to it than they do themselves upstream in order to promptly ship it. Google has the resources to deal with it but they haven't done it yet. They tie it to longer release cycles (quarterly maintenance releases, major releases) because it is so problematic. Certainly something we can do, and we're much more willing to take risks to fix security issues faster, but within reason.

Also, if we're just going to cherry-pick certain things, like going through only issues with CVEs, then I don't see a reason to include patches to code we're not building, etc. Can make a script to determine which files are actually used in the build and then omit patches outside that scope so that more time can be spent reviewing the ones that are actually relevant. Similarly, can manually omit ones for unused architectures, etc. If that seems to create more work, rather than less, then I think the issue is the changes are not reviewed / understood. I don't think blindly applying changes is a good idea. Lots of the upstream patches are broken and have follow-up commits fixing them or completing them, often not marked as such, which is one of the issues.

@thestinger
Thank you for the extremely detailed response.
I am well aware of the history of GrapheneOS.
Maybe you don't remember but you helped me port the old CyanogenMod 12 based CopperheadOS to the OnePlus One with PaX.
https://divestos.org/images/screenshots/CopperheadOS-bacon.png
https://github.com/Divested-Mobile/DivestOS-Build/blob/c0083c15193868d07cb6715dc418d515dfa2ad48/Patches/OLD/bacon/Kernel-All/ch-12.1/21.patch

Solving this problem at the roots is definitely the way to go.
Collaboration of all of these various projects could be improved, as we all can only put in so much time.
Issues without CVEs are a big issue.
CVE patches that are broken (as grsecurity loves to always point out) are another big issue and me blindly propagating them to devices isn't smart.
But I guess I am desperate for these old device?
I will try to respond to some of your other comments later.

https://news.ycombinator.com/item?id=24590627