Plagiarism Checker with CNN

Overview

This project implements a plagiarism checker using Convolutional Neural Networks (CNN). It is designed to analyze pairs of questions and predict whether they are duplicate or not. The CNN model is trained on the Quora Question Pair dataset from Kaggle.

Data
Project Structure
Requirements
Installation
Usage
License
Acknowledgments

Data

Please download the Quora Question Pair dataset from Kaggle and place it in the 'data' folder within your project directory. The dataset can be found here.

Project Structure

- /plagiarism_checker
  - /data
    - quora_question_pair_dataset.csv
  - /src
    - clean.py
    - model.py
    - utility.py
    - main.py
  - README.md
  - requirements.txt

Requirements

Make sure you have Python installed. You can download it from the official Python website.

Install the required dependencies using:

pip install -r requirements.txt

Installation

Clone the repository:

git clone https://github.com/vikasharma005/plag_checker.git

Navigate to the project directory:
```
cd plag_checker
```
Install dependencies:
```
pip install -r requirements.txt
```
Download the Quora Question Pair dataset from Kaggle and place it in the 'data' folder.

Usage

Run the plagiarism checker using the following command:

python main.py

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Word2Vec model: GoogleNews-vectors-negative300.bin.gz
Quora Question Pair dataset: Kaggle

vikasharma005/plag_checker