mortgage-document-ai

====================

A lightweight demo built to show how to extract structured data from mortgage documents using OCR and AI.

Live Demo

Try it out: Demo Link
View sample JSON response: Sample Response Link

What ML model is used

To extract structured data from mortgage documents, you must have an ML model trained on the documents you want to process.

In this demo, we use a custom model that we trained to extract 130 fields from the 1st and 2nd sections of the URLA Form 1003.

To create a custom model, we used Custom Document Extractor from Google Document AI.

Here's the document schema that contains a list of entities extracted from Form 1003.

What's inside

`models` package

.packages/models

It's a package that contains:

Normalized schema for each document type
And a collection of utils to map raw Google Document AI response to document schema

`api` package

.packages/api

It is an express.js server with a single endpoint that:

Accepts documents from the frontend
Sends processing to Google document AI
Uses models to transform into normalized schema
Returns normalized data back to the frontend

`ui` package

.packages/ui

It is a simple next.js/react.js application where the user can:

Upload document for processing
See the result of the processing

It has two main routes:

/ - where the user can upload a document or select a sample
/documents/[documentId] - where users see the result of the document

How it all works together

User uploads file into the ui on / route
ui sends the file for processing to the api
api sends the file for processing to the Google Document AI
Google Document AI process file using a custom model trained on our data
api uses document schemas & utils from models to DocumentResponse into usable data
api sends normalized document data to the ui
ui uses utils from models to transform data into usable for ui
ui renders extracted data on the /documents/[documentId] data

How to deploy it

This repository is intended for education purposes.

But it might be a good starting point if you build your document processing solution.

Here's a rough outline of how to get this demo running on your servers:

Train Custom Document Extractor from Google Document AI
1. Create a document extractor
2. Create a schema of the document
3. Label documents to prepare data for training
4. Train the model
Build document schema (example)
1. Update processor name
2. Create a list of labels extracted by your model
3. Map labels into the document structure you want to get as an output
Deploy the UI & API
1. Create render.com account
2. Use render.com blueprint to deploy (render.yaml)

If you need help building something similar, reach out to us here.

Technology stack

Document models
- Google Document AI
- TypeScript
Frontend UI
- React.js
- Next.js
- TypeScript
- Tailwind CSS
Backend API
- Node.js
- Express.js
- Typescript

Author

Built by MortgageFlow

Mortgage Software Consulting and Development Company

mortgageflow.io

opsflowhq/mortgage-document-ai