Data preparation
Java project CodeAnalysis is used to transform Java classes & methods extracted from files into unified represenation records. Both classes and methods are represented as a list of methods. Method_0 is introduced to capture class'es context and it has all surrounding class'es fields.
dataset was adopted from http://essere.disco.unimib.it/wiki/research/mlcsd
Classifier
Feed forward neural network is trained based on features extracted from a processed dataset using LSTM (https://github.com/D-a-r-e-k/Source-Code-Modelling) / Code2Vec (https://github.com/tech-srl/code2vec)
Theoretical work This work is based on Master thesis "Code smells detection using automatic source code features learning" Summary (in Lithuanian) is published in Vilnius University Proceedings „Lietuvos magistrantų informatikos ir IT tyrimai“ (http://www.journals.vu.lt/proceedings/issue/view/1129, 11th page)