This repository contains source code necessary to reproduce the model in the paper:
"PSCS: A Path-based Neural Model for SemanticCode Search"
- Python 3.6
- Pytorch 1.0.1
- NumPy
We use the CodeSearchNet as our dataset.
Download the Java dataset following the instruction of CodeSearchNet, and move it to the directory ./data/
.
The data exists in ./data
is the example data.
The whole process of our model, including data processing and model training, is in file pipeline.sh
.
Before running, edit the file pipeline.sh
following the instructions below.
Run the pipeline.sh
file:
sudo bash pipeline.sh
Note: the vocabularies for natural language of train set and test set are not shared. However, the vocabularies for path need to be shared to fit the production environment. For a code retrieval system, the code snippets to be searched are already known. So we don't need to worry about the information leakage problem.
To evaluate the performance of a trained model:
Run the test.py
file:
python test.py --test_epoch 100