CS4476 CV Project Repo

Project website can be found here.

Paper

The source for the three required deliverables (proposal, 2 updates) are broken down into 3 parts under docs/src. The files are organized like so under docs

.
├── compile.py
├── final.md
├── proposal.md
├── src
│   ├── final
│   │   ├── images
│   │   ├── index.md
│   │   └── sections
│   ├── proposal
│   │   ├── images
│   │   ├── index.md
│   │   └── sections
│   └── update
│       ├── images
│       ├── index.md
│       └── sections
│           ├── abstract.md
│           ├── approach.md
│           ├── conclusion.md
│           ├── experiments_and_results.md
│           ├── introduction.md
│           ├── qualitative_results.md
│           └── references.md
└── update.md

Deliverables

Click here to see the proposal.

Click here to see the first midterm update.

Click here to see the (second) final update.

How to contribute?

There are three versions, proposal, update, and final, and the source files for these are located under their respective directory under docs/src.

Each section of each version have their respective file under sections directory for each update. To work on a section, edit individual markdown files in there. Edit these, not the rendered version.

When you are done, execute the following script to recompile and commit:

Compile

Make sure you are under the docs directory, and have python3. The compiler will generate the final paper for each version (proposal.md, update.md, final.md) and put them under docs

chmod +x compile.py && ./compile.py

Images

To add images, put them under images directory for the correct version. Yes, each version has its own images directory. To link them from markdown, use the relative path. E.g. [Alt Text](../images/<filename>)

How to add a new section?

In index.md for the version you are working on, first add the line (order matters).

[//]: # "<section_name>.md"

Then, in sections.md under sections directory for the version, create the corresponding file section-name.md, edit the content of the section there.

Submission

Please Read

It is required to submit a self contained website for the website, GitHub Pages won't do. However, there is a work around - after each push to the repo, the GitHub site is automatically generated. Wait for the generation process to finish (you will see a green mark at the end of the commit), then run the following command to generate a static version of the site:

wget -P site -mpck --user-agent="" -e robots=off --wait 1 -E https://jiachenren.github.io/cs4476-cv-project/

Zip the folder site/jiachenren.github.io/cs4476-cv-project (which is generated by the above script) and submit that.

Database

We are using both self collected data the eDBtheque database.

Data collection protocol

Self collected data

The self collected data contain several manga pages crawled from different websites. They are used purely for research purposes.

Currently, we have uploaded 2 chapters of 2 different comics from the romantized indonesian manga site sektekomik to serve as our system's test data.

eDBtheque

The state-of-the-art manga database with ground truth pixel level labelling for panels and speech bubbles. It contains 100 pages in total, and is used in most of the relevant researches pertaining to information retrieval (IR) from manga.

If you are part of this project, contact Jiachen for the database login credentials. Otherwise request access from the owner here

Project Dependencies & Installation Guide

Install Tesseract OCR

This project makes use of google's Tesseract OCR for text recognition. In order for the system to successfully, run, please install command line tool tesseract and add it to path. For macos, just run

brew install tesseract

For other systems, refer to this guide. When you install tesseract, you might encounter some non-fatal errors, just ignore them. You'll be fine as long as you have the final binary.

Install Tesseract python package

In the project directory (assuming that you have your venv created), run

pip3 install pytesseract

Python dependencies

opencv-python

Preprocess input image by applying threshold and de-noise to convert to binary
SIFT related functionalities for extracting features from recognized text blocks
Group contours in detected text blocks for character level segmentation
Flood fill of speech bubbles using SIFT key points cluster centers as seeding coordinates
Morphing of speech bubble binary mask to consume texts within

PIL

Highlight text blocks
Converting between color spaces, write back to disk
Mask detected text blocks for iterative OCR

sklearn

- MeanShift clustering of SIFT descriptor matches in masked image to hypothesize new dialog bounding boxes

KMeans clustering of pixels under flood-fill seed mask to extract dominant color