/code_clone_detection

Code Clone Detection: MLN Project

Primary LanguageJupyter Notebook

Code Clone Detection

Setup

  • Dockerile <3

Files

A list of important notebooks are as follows:

  • notebooks/clone_detection_baseline.ipynb: Uses LSTM with code2vec(0.86)/fasttext(0.82)/random embeddings(0.83) for the task
  • notebooks/model_play-seasme.ipynb: Uses a Siamese Nework with base model of GrapConv+TopKPooling and node attributes assigned using code2vec(0.56)/fasttext(0.90)
  • notebooks/model_play.ipynb: Uses GrapConv+TopKPooling with code2vec(0.84)/fasttext(0.90).
  • notebooks/dgl_model_play.ipynb: Uses just GraphConv with code2vec(0.56)
  • [notebooks/data_preprocssing_main.ipynb]: For making trying different kinds of processing on AST network, making vocab, training fasttext embbedings. Other notebooks have experminents that we weren't able to execute succesfully due one or more errors.

A list of important code files:

  • src/code_parser.py: Code for parsing a string java code, making an AST followed by making a networkx graph and combining it.
  • src/dataset.py: Make several kind of torch_geometric dataset
  • src/data_prep.py: Data precrossing and data split script.

References