FamilyTreeAPI

WorkFlow:

The data is prepared from the electoral roll call of the Constetuency present in the region by visiting the Government's one stop Electoral Roll Website from where we have downloaded the pdf by enterning captcha which was a manual process.
The generated pdf then was converted to OCR enabled pdf.
The data extracted from OCR was futher processes and cleaned.
Then the data is converted to json object.
The data was further moved into MongoDB database
The API was then being generated using NodeJs, ExpressJs and hosted over AWS instance
The testing of API endpoints was done using postman.

We first fetch places from places collection in database and get place id.
Using place id we fetch electoral roll data of particular place.
Search the person with the voter id and find the relation with other person based on house no in that particular place.

Generating the pdf of the person and his relations with his family members as mentioned in the problem statement.

Create a .env file and add these two configuration

PORT = <YOUR PORT NUMBER>
MONGO_URL = <YOUR MONGODB URI>

Click on this link to get the sample_data: https://github.com/jhonsnow456/FamilyTreeAPI/tree/main/sample_data
upload the sample_data of electorals in json format in your mongoDB electoral collection.
upload the sample_data of places in json format in your mongoDB places collection.
Use NodeJS version 16LTS or higher
Use package manager such yarn or npm as per your choice:

For npm
```
npm i
```
For yarn:
```
yarn install
```

cd pdf_processing

$ python3.8 -m venv env
$ source env/bin/activate
$ pip3 install requirements.txt

Follow the steps mentioned in the next section Data Collection from pdf for further processing.

Install the cli-tool ocrmypdf to process pdf using the below command Since we are using linux system run
```
sudo apt install ocrmypdf
```
Install pikepdf using command line tool pip3 install pikepdf and write the below code to decrypt the file
```
import pikepdf
pdf = pikepdf.open('data2.pdf') # write your own protected pdf file name
pdf.save('data_2.pdf') # decrypted file
```
Note: This happens because of mordern day scanners.

Run the following command in the terminal to get the output ocr pdf file.

ocrmypdf -l eng --deskew --title 'data_.pdf' --job 2 --output-type pdfa data_2.pdf output.pdf

Now just extract the voter details clean it and convert the data into json format and put the details into the database mentioned above.

The electoral roll data which we are now using is based on english, however the same procedure can be done to extract other languages.
The other language is then being translated to the english using python library translate.
Thus expanding to length and breadth of our country and incrreasing the size of organised data of voters and their relations.