Code Duplication Project
This Python 3 script takes two repositories and checks for "code clones," or code which has likely been taken from elsewhere, between the two. The primary algorithm (Iodine) used comes from Lee et al. (2018) - Tree-Pattern-Based Clone Detection with High Precision and Recall. There are four main types of code clones:
- Type 1: exact copies
- Type 2: copies with renamed elements (ex. variables)
- Type 3: copies that have been slightly modified
- Type 4: "semantic" copies (code that is not copied, but does the same thing)
This code currently only checks for Type 1 clones. However, the majority of Type 2 clones are also detected successfully as ~80% matches. Hopefully, proper support for Type 2 and 3 clones will be added in a foreseeable future.
Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
Prerequisites
What things you need to install the software
- Python 3.7+ (Download Link)
- pip 19+ (typically already installed with Python)
- Git (Download Link)
Installation
- Clone the repository
git clone github.com/calebdehaan/codeDuplicationParser.git
- Enter the repository directory
cd codeDuplicationParser
- Install dependencies
pip3 install -r requirements.txt
- Run the program
python3 -m cli [args]
Alternatively you can run the program in a Python virtual environment
./code-duplication.sh
Dependencies
Python packages required for the tool to run
gitpython
bitstring
fastlog
windows-curses
(Windows only, required byfastlog
)flask
(for web UI)easy-postgres
(for web UI's database)pytest
(for unit tests)
Algorithms
- Iodine - The most complex one, therefore also the slowest. Performs a very thorough analysis and should be able to find (nearly) all clones.
- Chlorine - Performs a relatively simple string-based analysis and therefore is somewhat faster than Iodine.
- Oxygen - Very simple algorithm based on string comparison. By far the fastest algorithm if you only care about perfect code duplicates (100% type 1 clones).
Built With
- Python 3.7.3 - The Python version used
- GitPython - Used to pull git repositories
Authors
- Caleb DeHaan - Initial work - Github
- Denton Wood - Initial work - Github
- Stephanie Alvord - Initial work - Github
- Schaeffer Duncan - Initial work - Github
- Ivo Meixner - Initial work - Github