This project implements a plagiarism checker using Convolutional Neural Networks (CNN). It is designed to analyze pairs of questions and predict whether they are duplicate or not. The CNN model is trained on the Quora Question Pair dataset from Kaggle.
Please download the Quora Question Pair dataset from Kaggle and place it in the 'data' folder within your project directory. The dataset can be found here.
- /plagiarism_checker
- /data
- quora_question_pair_dataset.csv
- /src
- clean.py
- model.py
- utility.py
- main.py
- README.md
- requirements.txt
Make sure you have Python installed. You can download it from the official Python website.
Install the required dependencies using:
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/vikasharma005/plag_checker.git
-
Navigate to the project directory:
cd plag_checker
-
Install dependencies:
pip install -r requirements.txt
-
Download the Quora Question Pair dataset from Kaggle and place it in the 'data' folder.
Run the plagiarism checker using the following command:
python main.py
This project is licensed under the MIT License - see the LICENSE file for details.
- Word2Vec model: GoogleNews-vectors-negative300.bin.gz
- Quora Question Pair dataset: Kaggle