Similar Document Template Matching Algorithm

Doc Matching - A Similar Document Template Matching ML Model to detect fraudulent documents for insurance claims
Pre - Smart India Hackathon '23 VJTI - Team TechnoSrats

Table of Contents

Description
Links
Tech Stack
Project Setup
Usage
Team Members

📝Description

Fraud transactions and invoices are serious problems in the financial services and insurance industries, KPMG reported over a billion dollars in losses due to fraudulent transactions. Thousands of man hours are lost each year to tedious manual checking of invoices and documents to confirm their validity. Extraction of standard information common to most insurance related documents is also required, with the advent of advanced computer vision and object detection models the automation of these odious tasks has become possible

The key features are:

Detect and extract standardised fields of information from important documents such as -
- Invoice number
- Total amount
- Personal details of the claimant
Check for common markers of fake invoices such as
- Minor changes to details of the invoice like changing the colour of a logo, changing the date of issue, changing the name of the claimant etc.
- Grammatical errors
- Changing positions of the tables or service provider details of the invoice
Flag the fraudulent/suspicious documents with red and amber colours respectively on the Dashboard
Detect and group patterns in existing and new Documents, present the related templates and patterns in a clustering graph chart visually

Problem Statement ID: SIH1441
Problem Statement Title: Similar Document Template Matching Algorithm from Bajaj Finserv Health Ltd

Flowcharts

🔗Links

Assets

Backend (Hasura and Render)

🤖Tech-Stack

Web Development

NextJS
Material UI

Database

PostgreSQL (using Supabase)

APIs

Hasura GraphQL API (over the Postgres DB)
FastAPI (for the model)

Machine Learning

Tensorflow (for Deep-Learning based Bounding Box model)
Scikit-Learn (for NLP-based Named Entity Recognition)

🛠Project Setup

For the web-app

Clone the GitHub repo

$ git clone https://github.com/saRvaGnyA/similar-doc-matching.git

Enter the client directory. Install all the required dependencies. Ensure that remove any globally-installed packages like the React CLI, Tailwind CLI, PostCSS CLI or ESLint are uninstalled before proceeding ahead
```
$ cd client
$ yarn add
```

Setup the .env file for storing the environment variables. A demo file for this is as follows:

NEXT_PUBLIC_HASURA_ADMIN_SECRET = your hasura admin key
NEXT_PUBLIC_SUPABASE_ANON_KEY = your supabase anon key
NEXT_PUBLIC_SUPABASE_URL = your supabase public url

If you are working on Visual Studio Code or WebStorm, it'd be convenient to install the extensions for Prettier and ESLint.

For the model

Clone the GitHub repo

$ git clone https://github.com/saRvaGnyA/similar-doc-matching.git

Create a virtual environment on the anaconda command prompt (Install conda if not installed) and then switch to that virtual environment. Lets say the name of the env is test.
```
$ conda create -n test python=3.8 anaconda
$ conda activate test
```
Look for requirments.txt and install the packages.
```
$ pip install -r requirements.txt
```

For the FastAPI

Look for the main.py and utils.py files and have them ready. (The packages for FastAPI would already be installed when you run command number 3 in the above section)

💻Usage

Once the required setup and installation is completed, you can start developing and running the project.

For the web-app

Go to the frontend directory and run the dev script to activate the development server
```
$ npm run dev
```
Before pushing any commit, make sure to run the lint script and fix any linting errors
```
$ npm run lint
```
If you get an ESLint, Tailwind or PostCSS version conflict error, make a .env file in the client directory with the following contents:
```
SKIP_PREFLIGHT_CHECK = true
```

For the model and for the FastAPI

Locate to the Model directory. The models for the project are in gesture_model.tflite file.
Open the command prompt for anaconda and switch to the virtual environment that you created. (example: test)
```
$ conda activate test
```
To initiate the server, type the following in the command prompt
```
$ python main.py
```

saRvaGnyA/similar-doc-matching

Similar Document Template Matching Algorithm

📝Description

Flowcharts

🔗Links

Assets

Backend (Hasura and Render)

🤖Tech-Stack

Web Development

Database

APIs

Machine Learning

🛠Project Setup

For the web-app

For the model

For the FastAPI

💻Usage

For the web-app

For the model and for the FastAPI

👩‍💻Team Members