Document Similarity Matching

Overview

This project implements a system for matching and categorizing invoices based on their content and structure. It extracts text from PDF invoices, preprocesses the text, extracts relevant features, and calculates similarity scores to identify the most similar invoice from a database.

Project Structure

Requirements

Ensure you have Python 3.x installed. You will also need the following Python libraries:

pdfplumber
nltk
scikit-learn

These can be installed using the requirements.txt file.

Installation

Clone the Repository:

git clone <repository_url>
cd document_similarity_matching

abhishish3960/pdf-similatrity-check

Document Similarity Matching

Overview

Project Structure

Requirements

Installation