Deep Learning for Android Malware Defenses

This is an updated survey fo deep learning-based Android malware defenses, a constantly updated version of the manuscript, "Deep Learning for Android Malware Defenses: a Systematic Literature Review" by Yue Liu, Li Li, Chakkrit Tantithamthavorn and Yepang Liu. This paper has been accepted by ACM Computing Surveys.

To the best of our knowledge, no systematic literature review focusing on deep learning approaches for Android Malware defenses exists. In this paper, we conducted a systematic literature review to search and analyze how deep learning approaches have been applied in the context of malware defenses in the Android environment. As a result, a total of 132 studies covering the period 2014-2021 were identified. Our investigation reveals that, while the majority of these sources mainly consider DL-based on Android malware detection, 53 primary studies (40.1 percent) design defense approaches based on other scenarios. This review also discusses research trends, research focuses, challenges, and future research directions in DL-based Android malware defenses.

Please kindly cite this paper if it helps your research:

@article{liu2022deep,
	author = {Liu, Yue and Tantithamthavorn, Chakkrit and Li, Li and Liu, Yepang},
	title = {Deep Learning for Android Malware Defenses: A Systematic Literature Review},
	year = {2022},
	issue_date = {August 2023},
	publisher = {Association for Computing Machinery},
	address = {New York, NY, USA},
	volume = {55},
	number = {8},
	issn = {0360-0300},
	url = {https://doi.org/10.1145/3544968},
	doi = {10.1145/3544968},
	journal={ACM Computing Surveys},
	month = {dec},
	articleno = {153},
	numpages = {36},
}

You are welcome to update our review list!!

fork this repository, add it and merge back;
or email us.

If you see a project or link here that is no longer maintained or is not a good fit, please submit a pull request to improve this document. Thank you!

Systematic review process and paper lists

We collected primary studies related DL-based Android malware defenses from a variety of sources (IEEE, ACM Digital Library, Springer, Science Direct, Wiley Online Library, Google Scholar and Web of Knowledge). Only those studies related to deep learning-based Android malware defenses should be considered for further review;in addition, we proposed a quality appraisal criterion to obtain high-quality studies. The complete list of exclusion criteria and quality appraisal criterion is available at this page. After that, we obtained 132 relevant parimary studies.

Paper structure

We uploaded our completed paper lists to Google Drive with detailed reviewed information.

(Rewiew paper lists)

Our paper is structured as below:

Malware Defenses Objectives
- binary malware classification
- malware family attribution
- repackaged/fake app detection
- adversarial learning attacks and protections
- malware evolution detection and defense
- malicious behavior analysis
APK Characterization
- Program analysis approaches (static analysis, dynamic analysis, hybrid analysis)
- Feature categories (permission, API calls, filtered intents, app component, url, string, hardware component, app metadata, system call, dynamic activities, program graph, opcode, bytecode, java code)
- Feature encoding approaches (categorical, text-based, graph-based, image-based, hybrid)
Deep Learning Techniques
- Learning paradigms (supervised, supervised & unsupervised, unsupervised, reinforcement learning)
- Deep learning models (Multilayer Perceptrons, Convolutional Neural Networks, Recurrent Neural Networks, Deep Belief Networks, Autoencoders, Generative Adversarial Networks, Graph Neural Networks, Attention-based neural networks, Deep Reinforcement Learning, Transformers, Hybrid models)
- Model explanation
Deployment
- Off-device, Distributed, On-device
Performance evaluation
- Dataset
- Evaluation approaches
- Evaluation metrics
- Availability

If you are interested in the summary of each subtopic for these 132 primary studies, you can read our survey to catch more information; If you want to check detailed information for each primary study, you can read our review table.

Malware data collection

Data sources	Is update	Paper	Details
Drebin	-	NDSS-2014	123453 benign samples and 5560 malware(176 malware families); 2010-2012 samples
Genome	-	S&P-2012	863 benign and 1260 malware; 2010-2011 samples
Contagio	-	-	it consists of 11,960 mobile malware samples and 16,800 benign samples utill 2018
AMD	-	DIMVA-2017	24553 malware (2010-2016)
AndroZoo	Yes	MSR-2016	AndroZoo is a growing collection of Android Applications collected from several sources, including the official Google Play app market. It currently contains 17,951,878 different APKs.
VirusTotal	Yes	-	VirusTotal aggregates many antivirus products and online scan engines. It also provide datasets for researchers
VirusShare	Yes	-	VirusShare is a repository of malware samples to provide security researchers. System currently contains 44,390,572 malware samples.
CICMalDroid	-	-	It has more than 17,341 Android samples utill 2018.
RmvDroid	-	MSR-2019	9,133 malware samples, which belong to 56 malware families
Google Play	Yes	-	Google play is the official Android market. PlayDrone: Google crawler
Thirt-party markets	Yes	-	HUAWEI, APKpure, MI store, Tencent, 360, Wandoujia, Aptoide,Anzhi, APKmirror, Amazon Appstore, 9APPS
Google Play Malware	No	ICSE-2022	1,238 Android malware from 134 distinct malware families

Anti-virus tools

VirusTotal: Analyze suspicious files and URLs to detect types of malware, automatically share them with the security community. [Project link] [Request for research API]

Public tools

Deep learning-based Android malware defense approaches

Deep Android Malware Detection
- Bianry Malware Classification; CNN; Opcode Sequence
- in CODASPY '17 [pdf] [Code]
A Multimodal Deep Learning Method for Android Malware Detection Using Various Features
- Bianry Malware classification; CNN; Multiple features (String,method opcode, method API, shared library function opcode, permission, App component, environmental feature)
- in TIFS 2018, [pdf] [Code]
Detecting Android malware using Long Short-term Memory (LSTM)
- Bianry Malware Classification; LSTM; Permissions, dynamic behaviour
- in Journal of Intelligent & Fuzzy Systems, 2018, [pdf][Code]
{TESSERACT}: Eliminating experimental bias in malware classification across space and time
- Malware Evolution Detection and Defense, MLP, Drebin's features
- in USENIX Security Symposium , 2019, [pdf][Code]
DeepIntent: Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile Apps
- Malicious Behavior Analysis; CNN,RNN,AE; App metadata
- in CCS'19, [pdf] [Code]
An Android mutation malware detection based on deep learning using visualization of importance from codes
- Bianry Malware Classification; CNN; Java code
- in Microelectronics Reliability, 2019, [pdf] [Code]
Familial Clustering for Weakly-Labeled Android Malware Using Hybrid Representation Learning
- Malware family attribution; MLP; Java Code, App components, action, Requested Permission, Hardware,instrumentation classes, requested API, package name, version, referenced libraries.
- in TIFS 2019, [pdf] [Code]
Android Malware Detection Based on System Calls Analysis and CNN Classification
- Binary malware classification; CNN; System Call
- in IEEE Wireless Communications and Networking Conference Workshop (WCNCW), 2019, [pdf] [Code]
Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection
- Adversarial Learning Attacks and Protections; MLP;
- in TIFS 2020, [pdf][Code]
Evaluating explanation methods for deep learning in security
- Binary malware classification; MLP, CNN
- in EuroS&P'20; [pdf] [code]
Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware
- Malware Evolution Detection and Defense, Binary Malware Detection; MLP
- in CCS'20, [pdf] [code]
A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store
- Repackaged/Fake App Detection; CNN
- in TMC'20, [pdf] [code]
DENAS: Automated Rule Generation by Knowledge Extraction from Neural Networks
- Binary Malware Detection
- in FSE'20; [pdf] [code]
Experiences of Landing Machine Learning onto Market-Scale Mobile Malware Detection
- Binary Malware Detection
- in EuroSys'20; [pdf] [code]
Hybrid Analysis of Android Apps for Security Vetting using Deep Learning
- Binary Malware Classification; LSTM(Bi-LSTM and Attn-BiLSTM),
- in IEEE Conference on Communications and Network Security (CNS), 2020 [pdf][Code]
Understanding Privacy Awareness in Android App Descriptions Using Deep Learning
- Malicious Behavior Analysis, CNN
- in ACM Conference on Data and Application Security and Privacy, 2020, [pdf][Code]
Combining multi-features with a neural joint model for Android malware detection
- Binary Malware Detection, Malware Family Identification; RNN, CNN,
- in Journal of Intelligent & Fuzzy Systems, 2020, [pdf] [Code]
Experimental comparison of features and classifiers for Android malware detection
- Binary Malware classification; MLP,CNN,RNN,
- in International Conference on Mobile Software Engineering and Systems, 2020, [pdf][Code]
A Framework for Enhancing Deep Neural Networks Against Adversarial Malware
- Adversarial Learning Attacks and Protections; AE, MLP
- in IEEE Transactions on Network Science and Engineering, 2021 [pdf][Code]
Towards an interpretable deep learning model for mobile malware detection and family identification
- Malware Family Identification; CNN
- in Computers & Security 2021 [pdf][Code]
Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
- Adversarial Learning Attacks and Protections; MLP
- in USENIX Security Symposium 2021 [pdf][Code]
CADE: Detecting and Explaining Concept Drift Samples for Security Applications
- Malware Evolution Detection and Defense; AE
- in USENIX Security Symposium 2021 [pdf][Code]
DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode
- Binary Malware Detection; CNN
- in Deployable Machine Learning for Security Defense 2021 [pdf][Code]
Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?
- Malware Evolution Detection and Defense,Adversarial Learning Attacks and Protections; MLP, CNN, RNN
- in ACSAC'21 [pdf][Code]
Why an Android App is Classified as Malware? Towards Malware Classification Interpretation
- Malware detection; attention-based nueral networks
- in TOSEM 2021, [pdf][Code]
Heterogeneous Temporal Graph Transformer: An Intelligent System for Evolving Android Malware Detection
- Malware Evolution Detection and Defense, Binary Malware Detection; transformers, GNN
- in SIGKDD'21[pdf][Code]
Robust Android Malware Detection against Adversarial Example Attacks
- Adversarial Learning Attacks and Protections; Hybrid
- IN WWW'21 [pdf][Code]
PetaDroid: Adaptive Android Malware Detection Using Deep Learning
- Binary Malware Detection; Hybrid
- In Detection of Intrusions and Malware, and Vulnerability Assessment 2021 [pdf][Code]
Structural A!ack against Graph Based Android Malware Detection
- Adversarial Learning Attacks and Protections; DRL
- in CCS'21 [pdf][Code]
Continuous Learning for Android Malware Detection
- in USENIX Security'23 [pdf][Code]

Machine learning-based tools

Adversarial Deep Learning for Robust Detection of Binary Encoded Malware, in IEEE Security and Privacy Workshops (SPW), 2018, Adversarial deep learning, [code]
DroidCC: Android malware detection using deep learning, contains android malware samples, papers, tools etc;
MADLIRA: Malware detection using learning and information retrieval for Android
android-malware-detection: Android Malware Detection Using Machine Learning Classifiers ( Using Permissions requested by Apps)
MLDroid/drebin: Drebin - NDSS 2014 Re-implementation
MaMadroid: Implementation of MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models in NDSS 2017
Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection, in Empirical Software Engineering (EMSE).2021, Reproduction of Drebin, MaMadroid, Malscan, Droidcat, Revealdroid [code]
Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware, in TOSEM 2018, [code]

Program analysis tools

Apktool: A tool for reverse engineering Android apk files [link]
Androguard: Reverse engineering, Malware and goodware static analysis of Android applications ... and more [link]
FlowDroid: FlowDroid statically computes data flows in Android apps and Java programs. [link]
Monkey: An open source security tool for testing a data center's resiliency to perimeter breaches and internal server infection. The Monkey uses various methods to self propagate across a data center and reports success to a centralized Monkey Island server. [link]
DroidBox: Dynamic analysis of Android apps [link]
DroidBot: A lightweight test input generator for Android. Similar to Monkey, but with more intelligence and cool features. [link]

Supplementary materials

Deep learning

Research Papers

Deep learning - LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Nature, 2015, [pdf]
Deep learning - Goodfellow, Ian, et al. MIT press, 2016, [pdf1][pdf2]
Deep learning in neural networks: An overview - Schmidhuber, Jürgen. Neural networks, 2015, [pdf]

Online Tutorials and Repositories

Awesome - Most Cited Deep Learning Papers - [Project link]
Deep Learning Papers Reading Roadmap - [Project link]
Top Deep Learning Projects -[Project link]
Tracking Progress in Natural Language Processing -[Project link]
Deep Learning Tutorial - by Haozan Liang, only Chinese version, continously maintaining and updating, [Project link]

Tools: Tensorflow, keras, scikit-learn, pytorch

Android Malware Analysis