This project contains the code for preprocessing the code-docstring-corpus dataset.
###Usage
Exports function call grpah in the format of adjacency list or json
method-extraction.py [plain/json] path/to/code-docstring-corpus
Exports code docstrings in the format of tdidf vectors
tfidf.py path/to/code-docstring-corpus
Train LDA model for the functions specified by input file. Descriptions for functions with the same name concatenated together.
Calculate spearman coefficient for two types of embeddings.
Filter LDA embeddings using external files. The file provides the order of indexing used for embeddings created with diffent method.