the dataset has 4 folders. t1, t2, t3 and x to mention type1, type2, type3 and false clones. each folder has a number of code fragments.

candidate clone pairs are tagged by filenames. e.g. in t1 folder, 2_frag1.java and 2_frag2.java refers to the second clone pair in the subfolder dataset. the pair has two code fragments that is referred by the tag frag1 and frag2 in the filename.

A sample output:

sample output