This is an updated survey fo deep learning-based Android malware defenses, a constantly updated version of the manuscript, "Deep Learning for Android Malware Defenses: a Systematic Literature Review" by Yue Liu, Li Li, Chakkrit Tantithamthavorn and Yepang Liu. This paper has been accepted by ACM Computing Surveys.
To the best of our knowledge, no systematic literature review focusing on deep learning approaches for Android Malware defenses exists. In this paper, we conducted a systematic literature review to search and analyze how deep learning approaches have been applied in the context of malware defenses in the Android environment. As a result, a total of 132 studies covering the period 2014-2021 were identified. Our investigation reveals that, while the majority of these sources mainly consider DL-based on Android malware detection, 53 primary studies (40.1 percent) design defense approaches based on other scenarios. This review also discusses research trends, research focuses, challenges, and future research directions in DL-based Android malware defenses.
Please kindly cite this paper if it helps your research:
@article{liu2022deep,
author = {Liu, Yue and Tantithamthavorn, Chakkrit and Li, Li and Liu, Yepang},
title = {Deep Learning for Android Malware Defenses: A Systematic Literature Review},
year = {2022},
issue_date = {August 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {55},
number = {8},
issn = {0360-0300},
url = {https://doi.org/10.1145/3544968},
doi = {10.1145/3544968},
journal={ACM Computing Surveys},
month = {dec},
articleno = {153},
numpages = {36},
}
You are welcome to update our review list!!
- fork this repository, add it and merge back;
- or email us.
If you see a project or link here that is no longer maintained or is not a good fit, please submit a pull request to improve this document. Thank you!
- Systematic review process
- Paper structure
- Malware data collection
- Public malware defense tools
- Supplementary materials
- Recent Publications (Updating)
We collected primary studies related DL-based Android malware defenses from a variety of sources (IEEE, ACM Digital Library, Springer, Science Direct, Wiley Online Library, Google Scholar and Web of Knowledge). Only those studies related to deep learning-based Android malware defenses should be considered for further review;in addition, we proposed a quality appraisal criterion to obtain high-quality studies. The complete list of exclusion criteria and quality appraisal criterion is available at this page. After that, we obtained 132 relevant parimary studies.
We uploaded our completed paper lists to Google Drive with detailed reviewed information.Our paper is structured as below:
- Malware Defenses Objectives
- binary malware classification
- malware family attribution
- repackaged/fake app detection
- adversarial learning attacks and protections
- malware evolution detection and defense
- malicious behavior analysis
- APK Characterization
- Program analysis approaches (static analysis, dynamic analysis, hybrid analysis)
- Feature categories (permission, API calls, filtered intents, app component, url, string, hardware component, app metadata, system call, dynamic activities, program graph, opcode, bytecode, java code)
- Feature encoding approaches (categorical, text-based, graph-based, image-based, hybrid)
- Deep Learning Techniques
- Learning paradigms (supervised, supervised & unsupervised, unsupervised, reinforcement learning)
- Deep learning models (Multilayer Perceptrons, Convolutional Neural Networks, Recurrent Neural Networks, Deep Belief Networks, Autoencoders, Generative Adversarial Networks, Graph Neural Networks, Attention-based neural networks, Deep Reinforcement Learning, Transformers, Hybrid models)
- Model explanation
- Deployment
- Off-device, Distributed, On-device
- Performance evaluation
- Dataset
- Evaluation approaches
- Evaluation metrics
- Availability
If you are interested in the summary of each subtopic for these 132 primary studies, you can read our survey to catch more information; If you want to check detailed information for each primary study, you can read our review table.
Data sources | Is update | Paper | Details |
---|---|---|---|
Drebin | - | NDSS-2014 | 123453 benign samples and 5560 malware(176 malware families); 2010-2012 samples |
Genome | - | S&P-2012 | 863 benign and 1260 malware; 2010-2011 samples |
Contagio | - | - | it consists of 11,960 mobile malware samples and 16,800 benign samples utill 2018 |
AMD | - | DIMVA-2017 | 24553 malware (2010-2016) |
AndroZoo | Yes | MSR-2016 | AndroZoo is a growing collection of Android Applications collected from several sources, including the official Google Play app market. It currently contains 17,951,878 different APKs. |
VirusTotal | Yes | - | VirusTotal aggregates many antivirus products and online scan engines. It also provide datasets for researchers |
VirusShare | Yes | - | VirusShare is a repository of malware samples to provide security researchers. System currently contains 44,390,572 malware samples. |
CICMalDroid | - | - | It has more than 17,341 Android samples utill 2018. |
RmvDroid | - | MSR-2019 | 9,133 malware samples, which belong to 56 malware families |
Google Play | Yes | - | Google play is the official Android market. PlayDrone: Google crawler |
Thirt-party markets | Yes | - | HUAWEI, APKpure, MI store, Tencent, 360, Wandoujia, Aptoide,Anzhi, APKmirror, Amazon Appstore, 9APPS |
Google Play Malware | No | ICSE-2022 | 1,238 Android malware from 134 distinct malware families |
- VirusTotal: Analyze suspicious files and URLs to detect types of malware, automatically share them with the security community. [Project link] [Request for research API]
- Deep Android Malware Detection
- A Multimodal Deep Learning Method for Android Malware Detection Using Various Features
- Detecting Android malware using Long Short-term Memory (LSTM)
- {TESSERACT}: Eliminating experimental bias in malware classification across space and time
- DeepIntent: Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile Apps
- An Android mutation malware detection based on deep learning using visualization of importance from codes
- Familial Clustering for Weakly-Labeled Android Malware Using Hybrid Representation Learning
- Android Malware Detection Based on System Calls Analysis and CNN Classification
- Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection
- Evaluating explanation methods for deep learning in security
- Enhancing State-of-the-art Classifiers with API Semantics to Detect Evolved Android Malware
- A Multi-modal Neural Embeddings Approach for Detecting Mobile Counterfeit Apps: A Case Study on Google Play Store
- DENAS: Automated Rule Generation by Knowledge Extraction from Neural Networks
- Experiences of Landing Machine Learning onto Market-Scale Mobile Malware Detection
- Hybrid Analysis of Android Apps for Security Vetting using Deep Learning
- Understanding Privacy Awareness in Android App Descriptions Using Deep Learning
- Combining multi-features with a neural joint model for Android malware detection
- Experimental comparison of features and classifiers for Android malware detection
- A Framework for Enhancing Deep Neural Networks Against Adversarial Malware
- Towards an interpretable deep learning model for mobile malware detection and family identification
- Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
- CADE: Detecting and Explaining Concept Drift Samples for Security Applications
- DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection Based on Image Representation of Bytecode
- Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?
- Why an Android App is Classified as Malware? Towards Malware Classification Interpretation
- Heterogeneous Temporal Graph Transformer: An Intelligent System for Evolving Android Malware Detection
- Robust Android Malware Detection against Adversarial Example Attacks
- PetaDroid: Adaptive Android Malware Detection Using Deep Learning
- Structural A!ack against Graph Based Android Malware Detection
- Continuous Learning for Android Malware Detection
- Adversarial Deep Learning for Robust Detection of Binary Encoded Malware, in IEEE Security and Privacy Workshops (SPW), 2018, Adversarial deep learning, [code]
- DroidCC: Android malware detection using deep learning, contains android malware samples, papers, tools etc;
- MADLIRA: Malware detection using learning and information retrieval for Android
- android-malware-detection: Android Malware Detection Using Machine Learning Classifiers ( Using Permissions requested by Apps)
- MLDroid/drebin: Drebin - NDSS 2014 Re-implementation
- MaMadroid: Implementation of MaMaDroid: Detecting Android Malware by Building Markov Chains of Behavioral Models in NDSS 2017
- Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection, in Empirical Software Engineering (EMSE).2021, Reproduction of Drebin, MaMadroid, Malscan, Droidcat, Revealdroid [code]
- Lightweight, Obfuscation-Resilient Detection and Family Identification of Android Malware, in TOSEM 2018, [code]
- Apktool: A tool for reverse engineering Android apk files [link]
- Androguard: Reverse engineering, Malware and goodware static analysis of Android applications ... and more [link]
- FlowDroid: FlowDroid statically computes data flows in Android apps and Java programs. [link]
- Monkey: An open source security tool for testing a data center's resiliency to perimeter breaches and internal server infection. The Monkey uses various methods to self propagate across a data center and reports success to a centralized Monkey Island server. [link]
- DroidBox: Dynamic analysis of Android apps [link]
- DroidBot: A lightweight test input generator for Android. Similar to Monkey, but with more intelligence and cool features. [link]
Research Papers
- Deep learning - LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Nature, 2015, [pdf]
- Deep learning - Goodfellow, Ian, et al. MIT press, 2016, [pdf1][pdf2]
- Deep learning in neural networks: An overview - Schmidhuber, Jürgen. Neural networks, 2015, [pdf]
Online Tutorials and Repositories
- Awesome - Most Cited Deep Learning Papers - [Project link]
- Deep Learning Papers Reading Roadmap - [Project link]
- Top Deep Learning Projects -[Project link]
- Tracking Progress in Natural Language Processing -[Project link]
- Deep Learning Tutorial - by Haozan Liang, only Chinese version, continously maintaining and updating, [Project link]
Tools: Tensorflow, keras, scikit-learn, pytorch
Research Papers
- Android security: a survey of issues, malware penetration, and defenses - Faruki P, Bharmal A, Laxmi V, et al. IEEE communications surveys & tutorials, 2014, [pdf]
- A taxonomy and qualitative comparison of program analysis techniques for security assessment of android software - Sadeghi A, Bagheri H, Garcia J, et al. IEEE Transactions on Software Engineering, 2016, [pdf]
- The Evolution of Android Malware and Android Analysis Techniques - Tam K, Feizollah A, Anuar N B, et al. ACM Computing Surveys (CSUR), 2017, [pdf]
- Static analysis of android apps: A systematic literature review - Li L, Bissyandé T F, Papadakis M, et al. Information and Software Technology, 2017, [pdf] [Project link]
- A Survey on Malware Detection Using Data Mining Techniques - Ye Y, Li T, Adjeroh D, et al. ACM Computing Surveys (CSUR), 2017, [pdf]
- A survey on various threats and current state of security in android platform - Bhat P, Dutta K. ACM Computing Surveys (CSUR), 2019, [pdf]
- A survey of Android malware detection with deep neural models - Qiu J, Zhang J, Luo W, et al. ACM Computing Surveys (CSUR), 2020, [pdf]
- Comprehensive Android Malware Detection Based on Federated Learning Architecture - Deldar F, Abadi M. ACM Computing Surveys (CSUR), 2023, [pdf]
Recent relevant studies (Last update: 2023-02, we welcome our fellow researchers to update recent works)
- Are Machine Learning Models for Malware Detection Ready for Prime Time?; in IEEE Security & Privacy Magazine, 2023
- Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting; in Proc. of ACM Conference on Computer and Communications Security (CCS), 2023; Code
- RPAL-Recovering Malware Classifiers from Data Poisoning using Active Learning; in Proc. of ACM Conference on Computer and Communications Security (CCS), 2023;
- Enhancing Malware Detection for Android Apps: Detecting Fine-granularity Malicious Components; in 38th IEEE/ACM International Conference on Automated Software Engineering (ASE); 2023;
- Continuous Learning for Android Malware Detection; in USENIX Security, 2023;[Code]
- Humans vs. Machines in Malware Classification; in Usenix Security, 2023
- Bad Snakes: Understanding and Improving Python Package Index Malware Scanning; in International Conference on Software Engineering (ICSE), 2023
- RGDroid: Detecting Android Malware with Graph Convolutional Networks against Structural Attack; in IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER),2023
- Adversarial Training for Raw-Binary Malware Classifiers; in Usenix Security, 2023
- Black-box Adversarial Example Attack towards FCG Based Android Malware Detection under Incomplete Feature Information; in Usenix Security, 2023
- One Size Does not Fit All: Quantifying the Risk of Malicious App Encounters for Different Android User Profiles; in Usenix Security, 2023
- Post-GDPR Threat Hunting on Android Phones: Dissecting OS-level Safeguards of User-unresettable Identifiers; in Proc. Network and Distributed Systems Security Symposium (NDSS), 2023;
- Guided Retraining to Enhance the Detection of Difficult Android Malware; in The ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2023;
- DeUEDroid: Detecting Underground Economy Apps Based on UTG Similarity; in The ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2023; code
- API2vec: Learning Representations of API Sequences for Malware Detection; in The ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2023; code
- Jigsaw Puzzle: Selective Backdoor Attack to Subvert Malware Classifiers; in IEEE Symposium on Security and Privacy (S&P), 2023;
- Understanding the (In) Security of Cross-side Face Verification Systems in Mobile Apps: A System Perspective; in IEEE Symposium on Security and Privacy (S&P), 2023;
- From Grim Reality to Practical Solution: Malware Classification in Real-World Noise; in IEEE Symposium on Security and Privacy (S&P), 2023;
- Disguising Attacks with Explanation-Aware Backdoors; in IEEE Symposium on Security and Privacy (S&P), 2023;
- Android, notify me when it is time to go phishing; in IEEE European Symposium on Security and Privacy (EuroS&P), 2023
- Is It Overkill? Analyzing Feature-Space Concept Drift in Malware Detectors; in IEEE S&P on Deep Learning Security and Privacy Workshop, 2023
- PAD: Towards Principled Adversarial Malware Detection Against Evasion Attacks; in IEEE Transactions on Dependable and Secure Computing (TDSC), 2023
- Comprehensive Android Malware Detection Based on Federated Learning Architecture; in IEEE Transactions on Information Forensics and Security (TIFS), 2023
- Experimental comparison of features, analyses, and classifiers for Android malware detection; in Empirical Software Engineering, 2023
- A Large-scale Temporal Measurement of Android Malicious Apps: Persistence, Migration, and Lessons Learned; in 31st USENIX Security Symposium (USENIX Security 22), 2022
- The Droid is in the Details: Environment-aware Evasion of Android Sandboxes; in Proc. Network and Distributed Systems Security Symposium (NDSS), 2022; code
- MalWhiteout: Reducing Label Errors in Android Malware Detection; in 37th IEEE/ACM International Conference on Automated Software Engineering (ASE); 2022; code
- Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?; in International Symposium on Software Reliability Engineering (ISSRE), 2022
- TaintBench: Automatic real-world malware benchmarking of Android taint analyses; in Empirical Software Engineering, 2022
- A Deep Dive Inside DREBIN: An Explorative Analysis beyond Android Malware Detection Scores; in ACM Transactions on Privacy and Security, 2022
- Rotten Apples Spoil the Bunch: An Anatomy of Google Play Malware; in International Conference on Software Engineering (ICSE), 2022, online
- Debiasing Android Malware Datasets: How Can I Trust Your Results If Your Dataset Is Biased?; in IEEE Transactions on Information Forensics and Security (TIFS), 2022
- Eight Years of Rider Measurement in the Android Malware Ecosystem; in IEEE Transactions on Dependable and Secure Computing (TDSC), 2022
- AndroOBFS: Time-tagged Obfuscated Android Malware Dataset with Family Information; in IEEE/ACM 19th International Conference on Mining Software Repositories (MSR), 2022
- On the Relativity of Time: Implications and Challenges of Data Drift on Long-Term Effective Android Malware Detection; in Computers & Security, 2022
- Contrastive Learning for Robust Android Malware Familial Classification; in IEEE Transactions on Dependable and Secure Computing (TDSC), 2022
- Understanding Real-world Threats to Deep Learning Models in Android Apps; in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022
- Exposing the Rat in the Tunnel: Using Traffic Analysis for Tor-based Malware Detection; in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022; code
- Detecting and Measuring Misconfigured Manifests in Android Apps; in ACM SIGSAC Conference on Computer and Communications Security (CCS), 2022
- DitDetector: Bimodal Learning based on Deceptive Image and Text for Macro Malware Detection; in the 38th Annual Computer Security Applications Conference (ACSAC), 2022
- View from Above: Exploring the Malware Ecosystem from the Upper DNS Hierarchy; in the 38th Annual Computer Security Applications Conference (ACSAC), 2022
- Make Data Reliable: An Explanation-powered Cleaning on Malware Dataset Against Backdoor Poisoning Attacks; in the 38th Annual Computer Security Applications Conference (ACSAC), 2022
- SAUSAGE: Security Analysis of Unix domain Socket usAGE in Android; in IEEE European Symposium on Security and Privacy (EuroS&P), 2022
- MEGDroid: A model-driven event generation framework for dynamic android malware analysis; Information and Software Technology, 2021
- GDroid: Android Malware Detection and Classification with Graph Convolutional Network; Computers & Security, 2021
- Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware; arXiv preprint arXiv:2104.04798, 2021
- Towards an interpretable deep learning model for mobile malware detection and family identification; Computers & Security, 2021
- NATICUSdroid: A malware detection framework for Android using native and custom permissions; Journal of Information Security and Applications, 2021
- Mimosa: Reducing malware analysis overhead with coverings; arXiv preprint arXiv:2101.07328, 2021.
- IoTMalware: Android IoT Malware Detection based on Deep Neural Network and Blockchain Technology; arXiv preprint, 2021.
- Formal Equivalence Checking for Mobile Malware Detection and Family Classification; IEEE Transactions on Software Engineering (2021).
- A privacy and security analysis of early-deployed COVID-19 contact tracing Android apps; Empirical Software Engineering, 2021, 26(3): 1-51.
- Understanding worldwide private information collection on android; arXiv preprint, 2021.
- Systematic Mutation-Based Evaluation of the Soundness of Security-Focused Android Static Analysis Techniques; ACM Transactions on Privacy and Security (TOPS), 2021
- Malware Detection employed by Visualization and Deep Neural Network; Computers & Security, 2021
- Malware Detection and Analysis: Challenges and Research Opportunities; arXiv preprint, 2021.
- Towards interpreting ML-based automated malware detection models: a survey; arXiv preprint, 2021.
- A Novel Few-Shot Malware Classification Approach for Unknown Family Recognition with Multi-Prototype Modeling; Computers & Security, 2021
- Obfuscation-Resilient Executable Payload Extraction From Packed Malware;{USENIX} Security, 2021
- Marked for Disruption: Tracing the Evolution of Malware Delivery Operations Targeted for Takedown; arXiv preprint, 2021.
- Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection, in Empirical Software Engineering (EMSE).2021
- A Novel Android Malware Detection Method Based on Visible User Interface; in IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021