- CCLearner_Feature -- Generate data for training model
- CCLearner_Test -- Detect clone pairs by leveraging training models
- CCLearner_Train -- Generate training models
- Recall_Query -- SQL scripts for calculating recall rates of different types of clones
- Run -- Jar Files and dependencies for easy mode
- CCLearner.conf -- Configuration file of CCLearner
- Ubuntu14.04, JAVA 8
$ tar -xvzf era_bigclonebench.sql.tar.gz
$ tar -xvzf era_bcb_sample.tar.gz
$ apt-get update
$ apt-get install postgresql postgresql-contrib
# Change user
$ sudo -i -u postgres
# Run PostgreSQL console
$ psql
# Create dependent roles for BigCloneBench
postgres=# CREATE ROLE postgresql;
postgres=# CREATE ROLE bigclonebench;
# Data dump
postgres=# \i /home/cclearner/Desktop/CCLearner/era_bigclonebench.sql
# Create another user for use
CREATE USER cclearner with PASSWORD 'cclearner';
ALTER ROLE cclearner Superuser;
$ apt-get install pgadmin3
To run all the experiments in our paper, the following parameters could be changed. For 1-7, change the path with your own username and directory.
- source.file.path
- output.dir
- feature.file.path
- model.file.path
- pos.file.path
- sim.file.path
- clones.file.path
- feature.num
- feature.name
- training.iteration
- training.input.num
- training.hidden.num (also need to modify the source file in CCLearner_Train)
- testing.folder (users can reduce the number of testing folders to save time)
By using the default or modified configuration file, go to Run folder and execute the following commands
java -jar CCLearner_Feature.jar
java -jar CCLearner_Train.jar
java -jar CCLearner_Test.jar (may take some time)
To change datasets, more parameters or the source code, open CCLearner_Feature, CCLearner_Train, CCLearner_Test, rebuild and rerun the given project
Table "tools_clones" in PostgreSQL is used for data import. It is better to use pgAdmin to truncate table and import csv file into database.
- Double click server's name to connect server and database
- Right click "tools_clones" and click "truncate".
- Right click "tools_clones" and click "import..." (Choose Filename; Format - "csv"; Encoding - "UTF8")
In pgAdmin, click SQL icon on the top menu, choose one query file from Recall_Query folder and execute the query.
The numbers of true clones with different types in BigCloneBench for testing are T1(2,383), T2(671), VST3(873), ST3(5,365), MT3(31,413), WT3/4(1,540,513).
Recall Rate = Query Result / corresponding number of true clones