Bidirectional Context-Sensitive DGA Detection using DistilBERT

Malware use domain generation algorithms (DGAs) to generate pseudo-random domain names to evade supervision. In order to defend against DGA traffic, security researchers have to discover and comprehend the algorithm by reverse engineering malware samples and register these domains in a DNS blacklist. Even though, this list has to be frequently updated, it is readily circumvented by malware authors. An alternative approach is to detect DGA domains using deep learning techniques to classify domains. Recent work in DGA detection have leveraged deep learning architectures such as convolutional neural networks (CNNs) and long short-term memory networks (LSTMs) to classify domains. However, these classifiers perform inconsistently. Specifically wordlist-based DGA families have been a struggle for these architectures. We propose a novel model based on a distilled version of Bidirectional Encoder Representation from Transformers (DistilBERT) to detect DGA domains. The word embeddings are pre-trained on a large unrelated corpus to learn contextual embeddings for words bidirectionally. Afterwards, the pre-trained parameters enable for short training durations on DGA domains, while the language knowledge stored in the representation grants high performance with a small training dataset. We show that our model outperforms existing techniques on DGA classification, while simultaneously we need less time to train our model. Experiments in this paper are run on open datasets and the models' source code is provided to reproduce the results.

akdir/BachelorThesis

Bidirectional Context-Sensitive DGA Detection using DistilBERT