gwillem/magento-malware-scanner

mwscan should be able to import other rulesets (ie NBS, Magemojo)

Closed this issue · 7 comments

mwscan should be able to import other rulesets (ie NBS, Magemojo)

I've been searching across my github repo history and it turns out over the year's i've also cloned way more Magento security projects than I remembered. Some are rulesets, some are more focused on hardening as opposed to malware id...is your focus on this issue just to be able to import other magento malware ruleset definitions?

I'm just getting active now so I figured I'd check before I start adding to scope creep on Day 1 ;)

Thanks for your enthusiasm! My focus is:

  1. Collect Magento specific, high quality rules/whitelists to detect malware instances. There are many other malware signature repos out there. But it seems that, as malware projects tend to grow in scope, the accuracy decreases. So ideally, you would use my rules to accurately identify Magento malware and use other rulesets to identify "suspicious" code. NBS has some great heuristic rules that identify lots of new stuff (besides false positives).

  2. Provide an efficient scanner for use in hosting environments. Because we target Magento, we can heavily optimize normal Yara behavior, ie only scan specific files, include a reasonable sized whitelist, validate new signatures against all magento / public extension releases etc.

What do you think?

I like the approach a lot. Actually, I think the best possible outcome is to build this as a very focused tool and then perhaps later assemble a collection of related utilities as a bundle or toolkit. But one tool at a time and focus on the ones we can most directly use for our (or customer's) benefit.

I guess I just wanted to confirm because NBS's ruleset is really NOT specific to Magento, though, right? it's a PHP scanner...at least that's how they describe it and I haven't had time to dig into the internals yet..

I ask because I get that since Magento's built in PHP then technically any PHP attack vector could target it and we should start there...but it's also using JavaScript so theoretically that's another malware platform that scanning should consider for potential compromised targets. Then there's the frameworks in use...which are growing in M2 (Knockout.js for example...)

I don't think it's worth trying to do them all, and like you said there's either going to be known areas of weakness we need to be vigilant against or a known data payload the we know we need to defend against exfiltrations (creditcard and user profile data), so that can help narrow the target surface.

I guess unless something really weird gets added to M2 really the active languages are PHP and Javascript so it may NOT be a bad idea to consider those the real platforms we are scanning for malware, and it just so happens that Magento is the reference implementation...

Thanks for dissecting the target landscape! IMHO, if we broaden the focus to PHP (and JS), it would become less useful for Magento, because the accuracy would drop. As illustration, that's what NBS has done with their malware scanner. It is "a suspicious PHP" scanner. It explicitly doesn't want to identify individual malware instances but uses heuristics instead (and has only 15 rules). In other words, it trades accuracy for finding more new malware strains. This is useful in certain cases but requires lots of manual interpretation to counter the false positives (currently 79 FPs for a clean Magento 2.0.6 install).

OTOH, if we (pragmatically) focus on Magento (loosely defined by: every malware we ever encounter on a Magento installation in the wild) that narrows the scope and improves (future) accuracy. So that would further support narrowing the focus.

In that regard I think this scanner and NBSes are complementary: this scanner could be used for automated scanning and quarantining (99.9% accuracy) while NBS'es can be used for manual scans and to detect new malware.

This issue was implemented by #37.
But perhaps we should continue this discussion :)

I just ran through your PR and then decided to dig into the NBS issues and the threads you were in over there. I have been trying to get my head wrapped around the tradeoffs between signature based & heuristics for a while now. I see advantages of both and disadvantages also. I tend to lean more to the perspective you've articulated re: favoring known issues, especially since my experience has largely been similar to yours in that I have found little amazing innovation in the compromises I've run across.

Given Magento's reach, the number of deployments, and the level of technical sophistication of most store owners / admins running them...combined with the attractiveness of the targets...I see more relatively small variations on a core set of attack signatures. The sad fact is that it's like free market incentivized - the attackers tend not to NEED to be terribly creative when you still have 50k openly vulnerable shoplift installations, right?

It seems like your approach here is the right one - add the ability to import other yara rules, but don't make it the default to run them with the core ruleset. It is an option, which may be good (though I agree with you also I'd like to see your performance patch merged in before I'd be running it regularly on any installation I'd manage myself) and leave it to admins to decide if it's worth it to add heuristics to the scans.

I would add it might be helpful to write up some wiki content on this and a few other topics, especially if we'd like to add more optional features as that would help others decided if/when to use them without blowing up the issues list threads. I'm easily a much better writer than coder so that's one area I can probably add some immediate value if you think it would be worthwhile.

Does that work with your initial assumptions?

@beejhuff, are you actually proposing to write some docs? :D 🎉 That would be much appreciated! Well, you could basically copy paste your comments in this issue. I'll get back to you on DM to discuss some roadmap considerations.