This project attempts to determine whether or not applications are malware or not. To do this, it utilizes logs collected from a custom ROM that keeps track of certain API calls.
A simple clustering approach was used in a single dimension. The metric used was a score assigned based on the likelihood a given API call is made in the known malware logs.
Log files are to use the CSV format output by the utility provided to the class.
Place known logs for known samples in the following directories:
./samples/malware -- for known malware logs
./samples/clean -- for legitimate application logs
Place logs for unknown samples in the following directory for analysis:
./samples/unknown -- for not yet analyzed applications
Place the list of keywords in the following directory:
./library
This list of keywords has every term on a newline. Do not leave a newline as the last line of the file or else it will be interpreted as a keyword.
If any filenames are changed, reflect the changes in the configuration section of the "malware_anlysis.m" file.
Classification method may be changed using the configuration in "malware_anlysis.m"
To run the program simply run the malware\_analysis.m
file. The results of
the run will be plotted and exported to the report.txt
file unless the configuration
in malware_analysis.m
has been changed.
Two clustering methods are available by changing the method
variable inmalware_analysis.m
.
KMEANS -- use kmeans clustering
THRESH -- use a simple threshold