/apk_sdk_analysis

APK reverse to get sdk features & data mining

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Introduction

Data mining for APK by cluster and machine learning. Currently APK package features extraction has been completed.

Code structure

code design

The basic code design philosophy is 3 layers components:

  • low layer: Low level functions to contact with Androguard for single feature extraction.
  • med layer: Feature extraction control with config parsing and parallel computation.
  • first layer: Application layer. Work with user input and result output.

code files

File Description layer
api_list.p sensitive and system API list none
apkparser.py provide basic functions to get all features low
feature_extractor.py feature extraction process control multi-processing med
main.py start of the project first
pattern_parser.py parse pattern file and generate api_list.p none
test.apk apk file for unit test none
test.py unit test none

unit test

Run command below to test the basic feature extraction functions at class apkparser.

python test.py

Currently 18 test cases have been provided.

Extracted features

  • package names at apk
  • class count at each package
  • package depth
  • package name to array by FeatureHasher
  • class size
  • interface count for class
  • method count for class
  • virtual method count for class
  • variable count for class
  • instance variable count for class
  • static variable count for class
  • access flag sum for class
  • class name to array by FeatureHasher
  • super class name to array by FeatureHasher
  • native function count for class
  • method para count for class
  • method access info array for class