Our architecture

1. Feature Extraction

Using AndroPy Tool

2. Preprocessing

  • Remove unnecessary characters/strings: double quotes
  • Remove unnecessary key: hash function (MD5, SHA1, SHA256), filename, …
  • Remove VirusTotal report

3. Our model

3.1 Embedding layer

Reference: Bengio, Yoshua, Réjean Ducharme, and Pascal Vincent. "A neural probabilistic language model." Advances in neural information processing systems 13 (2000).

Evaluation

Environment

Name Specification
Service Google Colab (Pro)
GPU T4
RAM 64 GB (Recommend)

Accuracy & False rate (%)

Model\Metric Accuracy Precision Recall F1-Score FPR FNR
SimpleCNN-GRU 79.19 78.28 99.86 87.76 100.00 0.13
Standard CNN 73.86 77.36 94.18 84.94 99.50 5.81
CNN-BiLSTM 65.72 79.94 75.06 77.42 68.00 24.93
Our model 99.02 100.00 98.75 99.37 0.00 1.24

Training time (min,sec) & Model size (MB)

Model\Metric Training Time Model Size
SimpleCNN-GRU 15 min 28 sec 12.4
Standard CNN 14 min 46 sec 14.5
CNN-BiLSTM 24 min 59 sec 7.23
Our model 42 min 39 sec 552

Demo Section

See Android Detection Website (video + source code)