Demo • Problem Statement • Objectives • Solution • Features • Milestones • Enchancements • Installation • Stack
Building a highly accurate OCR solution that will take manually filled form as an input and provide the data in the digital form.
As the exact sciences company process thousands of the forms every week received from the fax. The data received in the form should be digitalized. But, manually digitalizing the form would contain human error & time-consuming. So, they would like to automate this process by placing the OCR system. The exact sciences company already has an OCR system in place but is somewhat less accurate and takes more time.
• Building an OCR system, which would be faster and accurate compared to the already present OCR system.
• Developing the API in a manner that it can be easily be customized and scaled according to the requirements.
As Exact Sciences company already has an OCR system in place, we had to build a system which would be more accurate and less time consuming than the present OCR system. Most of the forms received by the company are handwritten by the user. So, we had to take that constraint in mind too. We were allowed to use the COTS(Commercially Off the Shelf ) OCR solution but we had to stay in the budget and not go overboard.
After making extensive research and chatting with different COTS OCR solutions, we were impressed with the Nanonets OCR Solution. As it was very simple and straightforward and they were using the CRNN(Convolutional Recurrent Neural Network) & DRAM (Deep Recurrent Attention Model) and many more to create an OCR detection model. Coming to the pricing, we would be charged by the number of API calls made to the model.
We created a model in the nanonets but in order to train the model. We required a dataset of images and we were provided with two files(Sample Form & Blank form). To process the model, we needed over 150 files and manually filling these forms would have been a very time-effective process.
We built ESOCR Dataset Generator Repo, which would contain a script which would take data from the fake JSON and place the data over PSD and save the final output file. In this manner, we were easily able to generate around 150 images for the dataset. Once, we uploaded the images to the nanonets, we started annotating the images one by one manually in nanonets. We then started training the model, once the model was trained. We were able to predict the text from the uploaded image.
We built an API to interact with the Nanonets API. So, if we were to send a file to the API, the file will be processed by the model & provide us with the response. The API would then beautify and store the response in firebase and upload the file to the Google Cloud. API can easily be configured to upload the data to any preferred cloud. As provided in the problem statement ,the data received from the OCR should be sent to a digital form. So, we started working on building the frontend for the project
Note: . Later on in the call with exact sciences, it was verified that they are mainly looking for an pure API solution. But, until then, we had already built the frontend!
Coming to the frontend, we created a dashboard from which the user can upload the scanned files to the API which would then be processed and results will be provided to the user in a digital form. The user can also update the data from the digital form.
As one of the main objectives of our system was the ability to detect the handwritten fonts from the form. We are able to achieve that using our ESOCR system.
In order to achieve this, we had to train our model with different types of handwritten fonts.
Coming to response time, we were able to process the whole document approx under 22 sec for this file. Time may differ based on the quality of the file, size of the file and the type of file.
Note: The response time can be decreased by hosting the docker container on our cloud and providing more processing power.
The front end of the Esocr is very simple and straight forward and **can be customized easily ** according to our requirements. All of the processed forms of the user are available in the ESOCR Web App.
We can easily customize the fields and also add new fields in the nanonets easily. Let's suppose if we want to add a field called "email address" in the patient information. We can add that field in nanonets by creating a field called "patient.emailAddress" . Below are some of the sample responses from the OCR system.
{
{
"message": "Success",
"result": [
{
"message": "Success",
"input": "db0301a5-e2fe-4ada-9c1a-cab2a973db0a.jpg",
"prediction": [
{
"label": "provider.healthCare",
"xmin": 704,
"ymin": 460,
"xmax": 921,
"ymax": 502,
"score": 1,
"ocr_text": "EXOSPACE"
},
{
"label": "provider.name",
"xmin": 429,
"ymin": 559,
"xmax": 563,
"ymax": 602,
"score": 1,
"ocr_text": "Mcgee"
}// Many other fields present Here!!!
],
"page": 0,
"request_file_id": "0450e9a2-df44-4ed0-96b9-b38d831aeefc",
"filepath": "uploadedfiles/4ed6dcd3....../PredictionImages/356437671.jpeg",
"id": "68d096fd-b93e-11ea-8789-b655b7b9b939"
}
]
}
{
"uploadedFile": "sampleForm.jpg",
"prediction": {
"date": "10/04/2019",
"billing": {
"priorAuthorizationCode": "209750134",
"policyNumber": "484150",
"plan": "platinum",
"groupNumber": "153806",
"claimsSubmissionAddress": "565 Llama court, kentucky, 960309",
"primaryInsurance": "waretel",
"policyHolder": {
"dob": "04/08/2016",
"name": "_Velasquez"
}
}// Many other fields present Here!!!
},
"id": "2a989b8c-b93d-11ea-ac49-2afb8b0efd3c"
}
Even, if the form or document is a little bit misaligned the system would be able to detect the fields from the form.
At present to extract one text field from the form, it is cost around $0.0099 based on the plan. If there are 100 fields in a form it would cost $0.0099x100=$0.99/Document. In order to process around 100 documents, it would cost around $99. This cost is for creating the model, training and providing the model as an API.
The cost per field may defer based on the plan purchased from nanonets.
If we are daily processing around 1000 documents per day, it would cost us a huge amount of money. But, there is a solution to this issue. We had a call with the Nanonets sales team and after telling them all of our requirements. The customized solution with the radio buttons included would cost us around approx 499 dollars.They would be willing to provide us with the whole model & API as a docker container which can be hosted on any preferred cloud.
As all of the processed data is available in the firebase. We can update the data and fix any wrong predictions from the form available in the ESOCR Web App.
We can make multiple requests to the API to process multiple forms simultaneously and increase the overall performance of the app and decrease the time required for the processing of the multiple files.
As the web app, was not able to render the PDF/other file formats directly and performing file conversions to handle them would have been more time consuming and not that effective. To solve this issue, we started using the imgix, which would connect with our google cloud bucket and host all of the files by itself. Imgix was very helpful in decreasing/increasing the size of the images and also formatting them. Here is a sample pdf formatted by the imgix as a png.
2. We were able to predict the handwritten text in the sample form which was sent to the API for processing.
3. We built the API which would store responses as well as data in the firestore firebase and Google cloud respectively received from the model.
7. For the frontend, we were able to format all the different file types into JPEG's on the fly without any file conversion of the input file using the imgix.
The goals of this application were purposely kept within what was believed to be attainable within the allotted timeline and resources. As such, many enhancements can be made upon this initial design. The following are the milestones intended for future expansion and enhancement of the project.
-
As, we were only able to purchase the medium pricing plan for the OCR detection. It didn't include the radio buttons facility. But, in the future, If we are selected by the exact sciences company. We would either purchase the large pricing plan or custom pricing plan which would include the radio buttons functionality .
-
Increasing the training data for the OCR detection from 150 images to 500 images
-
Hosting the docker container on our cloud to increase the speed of the processing of the documents.
- After cloning the repo to your system, open up the project folder.
- Install all of the fonts from the fonts folder and restart the system.
- You need to go to this JSON Generator to generate fake JSON data.
- Copy the fake JSON and place it in the sample.json file.
- Now, you need to open up the photoshop-script.jsx in any file editor and modify the path
var file = new File('yourPath/generatedImages/' + name + '.jpg');
//sample
//var file = new File('D:/Projects/OCR/images/generatedImages/' + name + '.jpg');
- Now, you can either open up placeHolder.psd or placeHolder_bestCase.psd from the photoshop
- Once the psd is opened, you have to click on the file -> scripts -> browse and select the photoshop-script.jsx script and it will start producing the images
- The images will be stored in the path specified with the index number.
Express JS: We have used Express JS to create the API as it is very robust, small and provides a range of features.
Nanonets API: The Nanonets API is used to process the image/pdf provided to it and provide the results back to the ESOCR API.
Firestore Database: We have used the Firebase Firestore database to store all of the form-data gathered from the provided form.
Rest API: The API is very flexible and can modify the API according to our requirements and the API can be easily scalable.
Babel: We have used to babel to convert the ES6 code into ES5 code. For the node to understand the code.
React JS: React JS is very useful while creating web apps that are data-driven. React is fast, scalable, and responsive.
Material UI: We have consistently used Material UI components throughout the application. In order to make the web app beautiful & responsive across all the devices.
SWR: We have used SWR for fetching the data as it stores the data in the cache and has many awesome features.
React Redux: We used redux to handle the complex states of the application and thunk to make the API calls to the backend.
SASS/SCSS: SASS is way powerful than generic CSS. We have opted to build the styles in SCSS for the application.
Ashfaq Nisar 🎨 💻 🚇 📖 |
Vamshi Krishna 🤔 🎨 📖 |
This project follows the all-contributors specification. Contributions of any kind are welcome!