====================
A lightweight demo built to show how to extract structured data from mortgage documents using OCR and AI.
- Try it out: Demo Link
- View sample JSON response: Sample Response Link
To extract structured data from mortgage documents, you must have an ML model trained on the documents you want to process.
In this demo, we use a custom model that we trained to extract 130 fields from the 1st and 2nd sections of the URLA Form 1003.
To create a custom model, we used Custom Document Extractor from Google Document AI.
Here's the document schema that contains a list of entities extracted from Form 1003.
.packages/models
It's a package that contains:
- Normalized schema for each document type
- And a collection of utils to map raw Google Document AI response to document schema
.packages/api
It is an express.js server with a single endpoint that:
- Accepts documents from the frontend
- Sends processing to Google document AI
- Uses models to transform into normalized schema
- Returns normalized data back to the frontend
.packages/ui
It is a simple next.js/react.js application where the user can:
- Upload document for processing
- See the result of the processing
It has two main routes:
/
- where the user can upload a document or select a sample/documents/[documentId]
- where users see the result of the document
- User uploads file into the
ui
on/
route ui
sends the file for processing to theapi
api
sends the file for processing to theGoogle Document AI
Google Document AI
process file using a custom model trained on our dataapi
uses document schemas & utils frommodels
to DocumentResponse into usable dataapi
sends normalized document data to theui
ui
uses utils from models to transform data into usable forui
ui
renders extracted data on the/documents/[documentId]
data
This repository is intended for education purposes.
But it might be a good starting point if you build your document processing solution.
Here's a rough outline of how to get this demo running on your servers:
- Train Custom Document Extractor from Google Document AI
- Create a document extractor
- Create a schema of the document
- Label documents to prepare data for training
- Train the model
- Build document schema (example)
- Update processor name
- Create a list of labels extracted by your model
- Map labels into the document structure you want to get as an output
- Deploy the UI & API
- Create render.com account
- Use render.com blueprint to deploy (render.yaml)
If you need help building something similar, reach out to us here.
- Document models
- Google Document AI
- TypeScript
- Frontend UI
- React.js
- Next.js
- TypeScript
- Tailwind CSS
- Backend API
- Node.js
- Express.js
- Typescript
Built by MortgageFlow
Mortgage Software Consulting and Development Company
mortgageflow.io