This repository has migrated to https://github.com/giganticode organization and is no longer supported.
This is a project for a master thesis with a title "Supporting logging activities by mining software repositories"
The general goal is using a large number of projects from github to create a model that getting source code as input suggests different kind of information related to logging (e.g. place in code to put a logging statement, the text of the logging statement, log level etc.)
We use dataset from Mining source code repositories at massive scale using language modeling. M Allamanis, C Sutton
Statistics about dataset TBA
Data gathering in more details
On this step data is prepared for the lang modelling step (tokenization, reduction of vocabulary size)
Data preprocessing in more details
Training language models using different kinds of architecture and different parameters; analysing and comparing performance of different models.
Language modelling in more details
Based on pretrained language model, we build classifiers that are trained to predict the correct position of log statement in the code, their level, text and variables in log statements.
Building classifier in more details.
The pluggin supports developers by helping with log decisions.
IntelliJ plugin building in more details.
- log-recommender-cli: a command line tool for managing datasets and their parsing, preprocessing etc.
TBA