Datasets and training your own datasets can be in two different formats. Need to pay attention to the placement of the format!
The warehouse implements the Siamese network, which is often used to detect the similarity of two input code images. The backbone feature extraction network (backbone) used in this warehouse is VGG16.
Configure the environment according to requirements.txt.
pip install -r requirements.txt
Need to download vgg16-397923af.pth pre-training weights
Link: https://pan.baidu.com/s/14SFoKX6xTDPx2XG9rcUTDQ Extraction code: 44en
- Download clone fragments from https://github.com/clonebench/BigCloneBench to your_txt_path. Each code fragment corresponds to a txt file.
- Cluster files by clone relationship.
- Modify the your_txt_path in codeVis.py.
- Run codeVis.py to generate code images.
- Follow training steps.
- Modify the model_path in the siamese.py file to correspond to the trained file. The trained file is in the logs folder.
_defaults = {
"model_path": 'model_data/vgg.pth',
……
}
- Run predict.py, enter
your_img/xxx.png
If you want to train your own model, you can arrange the data set in the following format. For example:
- images_background
- character01
- 0709_01.png
- 0709_02.png
- ……
- character02
- character03
- ……
The training steps are:
- Place the dataset according to the above format, and put it in the dataset folder.
- Then set the train_own_data in train.py to True.
- Run train.py to start training.
You can see more descriptions of Siamese Neural Networks in the: https://github.com/tensorfreitas/Siamese-Networks-for-One-Shot-Learning