/FamilyTreeAPI

A set of APIs that creates relation from voter data based on House No.

Primary LanguageJavaScript

FamilyTreeAPI

AWS   mongodb   nodejs   Python

WorkFlow:

  • The data is prepared from the electoral roll call of the Constetuency present in the region by visiting the Government's one stop Electoral Roll Website from where we have downloaded the pdf by enterning captcha which was a manual process.
  • The generated pdf then was converted to OCR enabled pdf.
  • The data extracted from OCR was futher processes and cleaned.
  • Then the data is converted to json object.
  • The data was further moved into MongoDB database
  • The API was then being generated using NodeJs, ExpressJs and hosted over AWS instance
  • The testing of API endpoints was done using postman.

About APIs

  • We first fetch places from places collection in database and get place id.
  • Using place id we fetch electoral roll data of particular place.
  • Search the person with the voter id and find the relation with other person based on house no in that particular place.

Run in Postman

Additional Feature:

  • Generating the pdf of the person and his relations with his family members as mentioned in the problem statement.

Project Configuration and setup:

Server setup:

Python setup:

  • change directory to pdf_processing
cd pdf_processing
  • Use python version 3.7 or 3.8
  • Run the command mentioned below:
$ python3.8 -m venv env
$ source env/bin/activate
$ pip3 install requirements.txt
  • Follow the steps mentioned in the next section Data Collection from pdf for further processing.

Data Collection from pdf

  • Install the cli-tool ocrmypdf to process pdf using the below command Since we are using linux system run
    sudo apt install ocrmypdf
    
  • Install pikepdf using command line tool pip3 install pikepdf and write the below code to decrypt the file
    import pikepdf
    pdf = pikepdf.open('data2.pdf') # write your own protected pdf file name
    pdf.save('data_2.pdf') # decrypted file
    
    Note: This happens because of mordern day scanners.
  • Run the following command in the terminal to get the output ocr pdf file.
    ocrmypdf -l eng --deskew --title 'data_.pdf' --job 2 --output-type pdfa data_2.pdf output.pdf
    
  • Now just extract the voter details clean it and convert the data into json format and put the details into the database mentioned above.

Future Scope

  • The electoral roll data which we are now using is based on english, however the same procedure can be done to extract other languages.
  • The other language is then being translated to the english using python library translate.
  • Thus expanding to length and breadth of our country and incrreasing the size of organised data of voters and their relations.