code2vec
an implementation of "code2vec: Learning Distributed Representations of Code"
Requirements
- python 3.6+
- pytorch 0.4.1+
- scikit-learn
- tensorboardX (optional)
Usage
train with "dataset"
python main.py --lr 0.01 --corpus_path ./dataset/corpus.txt --path_idx_path ./dataset/path_idxs.txt --terminal_idx_path ./dataset/terminal_idxs.txt --model_path ./output --vectors_path ./output/code.vec --terminal_embed_size 100 --path_embed_size 100 --encode_size 100 --max_epoch 40 --random_seed 1 --dropout_prob 0.25
train with large "top11_dataset"
- top 11 dataset: http://groups.inf.ed.ac.uk/cup/codeattention/
concatenate dataset:
cat ./top11_dataset/splitted_corpus.* > ./top11_dataset/corpus.txt</code>
train the model:
python main.py --batch_size 1024 --lr 0.01 --corpus_path ./top11_dataset/corpus.txt --path_idx_path ./top11_dataset/path_idxs.txt --terminal_idx_path ./top11_dataset/terminal_idxs.txt --model_path ./output --vectors_path ./output/code.vec --terminal_embed_size 100 --path_embed_size 100 --encode_size 100 --max_epoch 20 --random_seed 1 --dropout_prob 0.25
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.