/HStyle

A historical style generator (historical style document synthesis).

Primary LanguagePython

Angular linting badge Angular coverage badge Python linting badge Python coverage badge

Hstyle

Historical documents can reveal a great deal of information about our past, such as, form of writing, wording, content that did not exist and more. In order to perform computational learning (Machine Learning) a huge amount of classified data (Classified Data) is needed. The process of creating classified data (Annotations) is expensive and tedious work, and therefore in the field of historical documents, the databases that exist for training models are small. These datasets do not allow training deep models to get high results.

In order to create a large database of data, in an easy way that requires less resources, it is necessary to create synthetic data. In the this project, we researched a method for creating synthetic historical data and developed a system (website) that allows each user to synthesize documents himself.

Our method is a deep learning method based on neural style transfer. In order to improve the results of the method, we used several techniques of computer vision, such as Binarization, Dilation and Image Processing.

This Project was created with Python, FastAPI, TensorFlow, Keras, OpenCV, Angular, Bootstrap and more libraries.

Project Research

In order to understand the steps and what we did you are welcome to look at the Project Book.

Project Setup and Run

In order to run this project with docker your environment needs to support TensorFlow Docker. you can follow this link to get everything set settled.

Run on local environment:

  1. Clone this repository.
  2. Open cmd/shell/terminal and go to application folder: cd Hstyle/app
  3. Run the docker-compose file: docker-compose -f docker-compose-local.yml up
  4. Open this link
  5. Enjoy the application.

Run on production environment:

  1. Clone this repository.
  2. Open the following file: Hstyle/app/client/src/environments/environment.prod.ts
  3. In the opened file from step 2 change the API_URL to 'http://PRODUCTION_IP_ADDRESS:5000' where PRODUCTION_IP_ADDRESS is your deployment server IP address.
  4. Open cmd/shell/terminal and go to application folder: cd Hstyle/app
  5. Run the docker-compose file: `docker-compose -f docker-compose-prod.yml up``
  6. Open this link http://PRODUCTION_IP_ADDRESS:3000/ where PRODUCTION_IP_ADDRESS is your deployment server IP address.
  7. Enjoy the application.

Demo

HStyle Demo

Examples

Content Image Style Image Changes Applied To Content Image Result
content style Original result
content style Original result
content style Apply dilation result
content style Apply dilation result
content style Apply binarization result
content style Apply binarization result
content style Apply dilation and binarization result
content style Apply dilation and binarization result
content style Replace white background with style average pixel value result
content style Replace white background with style average pixel value result
content style Replace white background with style average pixel value + Apply dilation result
content style Replace white background with style average pixel value + Apply dilation result
content style Replace white background with style average pixel value + Apply binarization result
content style Replace white background with style average pixel value + Apply binarization result
content style Replace white background with style average pixel value + Apply binarization + Apply dilation result
content style Replace white background with style average pixel value + Apply binarization + Apply dilation result

Evaluation

In order to evaluate and determine which technique is best from 3 techniques, which we thought have the best results (Original content image, Dilate content image, Binary content image), we performed a survey of 50 participants and asked them to rate image readability and image historical look, 1-being the lowest (poor) and 5-being the highest (great).

Result for image historical look

Historical Image Readability

As we can see, ‘dilate content image’ and ‘binary content image’ get the highest amount of votes for rate three and above, meaning, these results have the highest readability.

Result for image historical look

Image Historical Look

As we can see, ‘dilate content image’ gets the highest amount of votes for rate three and above, meaning, these results have the most historical look.