/Machine-Learning-and-Binary-Visualization-for-Classifying-Files-in-Network-Traffic

We propose a tool that extracts unencrypted network data from packet captures files and uses binary visualization techniques, namely: entropy, natural translation, and hilbert curve to create images from the data. Those images are then used as a training set for the machine learning process. We showcase the classifications of image and evaluate the accuracy of various models. We conclude with a summary of our findings.

Everyday hundreds of new malware samples are being discovered. The sheer amount of data makes these samples difficult to analyze with traditional tools which rely on the known structure of data to determine classes or families of malware. “Symantec discovered more than 430 million new unique pieces of malware in 2015, up 36 percent from the year before”(Symantec, 2016). Therefore, there is an opening for tools and techniques which can analyze the data in network traffic, without knowing the structure of the data.

We propose a tool that extracts unencrypted network data from packet captures files and uses binary visualization techniques, namely: entropy, natural translation, and hilbert curve to create images from the data. Those images are then used as a training set for the machine learning process. We showcase the classifications of image and evaluate the accuracy of various models. We conclude with a summary of our findings. 1