DLTPy

Deep Learning Type Inference of Python Function Signatures using their Natural Language Context

DLTPy makes type predictions based on comments, on the semantic elements of the function name and argument names, and on the semantic elements of identifiers in the return expressions. Using the natural language of these different elements, we have trained a classifier that predicts types. We use a recurrent neural network (RNN) with a Long Short-Term Memory (LSTM) architecture.

Read our paper for the full details.

Components

`preprocessing/` Preprocessing Pipeline (a-d)

Downloads projects, extracts comments and typesm and gives a csv file per project containing all functions.

Start using:

$ python preprocessing/pipeline.py

Optional arguments:

  -h, --help            show this help message and exit
  --projects_file PROJECTS_FILE
                        json file containing GitHub projects
  --limit LIMIT         limit the number of projects for which the pipeline
                        should run
  --jobs JOBS           number of jobs to use for pipeline.
  --output_dir OUTPUT_DIR
                        output dir for the pipeline
  --start START         start position within projects list

`input-preparation/` Input Preparation (e-f)

input-preparation/generate_df.py can be used to combine all the separate csv files per project into one big file while applying filtering.

input-preparation/df_to_vec.py can be used to convert this generated csv to vectors.

input-preparation/embedder.py can be used to train word embeddings for input-preparation/df_to_vec.py.

`learning/` Learning (g)

The different RNN models we evaluated can be found in learning/learn.py.

Testing

$ pytest

Credits

License

The MIT License (MIT). Please see the license file for more information.

casperboone/dltpy